https://kedro.org/ logo
#questions
Title
# questions
a

Adrien

01/11/2024, 4:58 PM
Hello guys, I have a problem with kedro node definition. In order to preprocess my data, I use Dask Cluster in one of my nodes. My problem : for each parallel processing, I need the output path witch is not accessible in the function of a node. Has anyone solved the problem ?
d

datajoely

01/11/2024, 5:04 PM
it’s less of a problem and more that kedro intentionally separates business logic from IO logic
we don’t really support conditional flow based on filepath outputs
there is a belief the combinatorial complexity leads to headeaches which is why we steer people away from it
a

Adrien

01/11/2024, 5:05 PM
Ok ok but how to you handle distributed computing with huge clusters ?
d

datajoely

01/11/2024, 5:05 PM
are you using the
dask.ParquetDataSet
?
a

Adrien

01/11/2024, 5:07 PM
I'm using custom webdataset for I/O efficiency but this type of dataset is not supported by kedro 😞
I'm parallel processing small audio files
d

datajoely

01/11/2024, 5:08 PM
Okay but I’m still not sure why the filepaths need to be part of the node logic
why do regular catalog entries not work?
a

Adrien

01/11/2024, 5:09 PM
I was not aware of dask deployment tutorial, thanks for sharing ! I'll read it and comeback to you