Hi team a very basic question about `PartitionedDataSets` I Kedro #questions

Hi team, a very basic question about `PartitionedD...

Tomás Rojas

03/06/2023, 11:32 PM

Hi team, a very basic question about

PartitionedDataSets

. I noticed they return a dictionary with bounded methods for loading each dataset. My question is: Is there a way to write the nodes simply as a function of the object returned by the bounded method or should I write the nodes thinking about the dictionary?

Tom C

03/07/2023, 1:06 AM

A follow on question; Is there a way to process the bounded methods concurrently via native Kedro or do I need to write a custom currency processing logic? I'd love to pass

--runner=ParallelRunner

and have the processing of the Partitions be concurrent natively within Kedro. I appreciate that it's likely going to be something that should be tailored to each problem. Perhaps I'll utilise asyncio.gather and at least minimise time spent waiting during each data loading operation.

datajoely

03/07/2023, 10:11 AM

@Tom C if you run

kedro run --async

it will load datasets concurrently IIRC but due to python weirdness you can’t use async and ParallelRunner at the same time

👍 1

datajoely

03/07/2023, 10:12 AM

@Tomás Rojas you get bounded lazy functions on read, but you can return EITHER the eagerly loaded partitions / our lazy bound functions on save

datajoely

03/07/2023, 10:12 AM

it’s not quite clear from your question what you’re looking for?

Tom C

03/07/2023, 10:16 PM

@datajoely, to clarify: Will

--async

load catalogue entries AND PartitionedDataSet partitions in the async event loop? Or will it async across the catalogue but sync within each catalogue entry?

Tomás Rojas

03/07/2023, 10:22 PM

@datajoely thanks for your answer, how can I load the partitions and not use the bounded functions? in this case I end up calling the functions so it makes no difference

Tom C

03/07/2023, 10:25 PM

I think you have to use the functions, Tomás. The examples in the docs all show a

for

loop loading the partitions. I assume this is to allow for a memory-safe method to iterate through the partitions and process each one.

Tomás Rojas

03/07/2023, 10:26 PM

Thank you @Tom C, I'll keep it that way then 🙂

👍 1

63 Views

Open in Slack