Hi team, a very basic question about `PartitionedD...
# questions
t
Hi team, a very basic question about
PartitionedDataSets
. I noticed they return a dictionary with bounded methods for loading each dataset. My question is: Is there a way to write the nodes simply as a function of the object returned by the bounded method or should I write the nodes thinking about the dictionary?
t
A follow on question; Is there a way to process the bounded methods concurrently via native Kedro or do I need to write a custom currency processing logic? I'd love to pass
--runner=ParallelRunner
and have the processing of the Partitions be concurrent natively within Kedro. I appreciate that it's likely going to be something that should be tailored to each problem. Perhaps I'll utilise asyncio.gather and at least minimise time spent waiting during each data loading operation.
d
@Tom C if you run
kedro run --async
it will load datasets concurrently IIRC but due to python weirdness you can’t use async and ParallelRunner at the same time
👍 1
@Tomás Rojas you get bounded lazy functions on read, but you can return EITHER the eagerly loaded partitions / our lazy bound functions on save
it’s not quite clear from your question what you’re looking for?
t
@datajoely, to clarify: Will
--async
load catalogue entries AND PartitionedDataSet partitions in the async event loop? Or will it async across the catalogue but sync within each catalogue entry?
t
@datajoely thanks for your answer, how can I load the partitions and not use the bounded functions? in this case I end up calling the functions so it makes no difference
t
I think you have to use the functions, Tomás. The examples in the docs all show a
for
loop loading the partitions. I assume this is to allow for a memory-safe method to iterate through the partitions and process each one.
t
Thank you @Tom C, I'll keep it that way then 🙂
👍 1