Hello guys, I have a problem with kedro node defin...
# questions
a
Hello guys, I have a problem with kedro node definition. In order to preprocess my data, I use Dask Cluster in one of my nodes. My problem : for each parallel processing, I need the output path witch is not accessible in the function of a node. Has anyone solved the problem ?
d
it’s less of a problem and more that kedro intentionally separates business logic from IO logic
we don’t really support conditional flow based on filepath outputs
there is a belief the combinatorial complexity leads to headeaches which is why we steer people away from it
a
Ok ok but how to you handle distributed computing with huge clusters ?
d
are you using the
dask.ParquetDataSet
?
a
I'm using custom webdataset for I/O efficiency but this type of dataset is not supported by kedro 😞
I'm parallel processing small audio files
d
Okay but I’m still not sure why the filepaths need to be part of the node logic
why do regular catalog entries not work?
a
I was not aware of dask deployment tutorial, thanks for sharing ! I'll read it and comeback to you