I have a cluster algorithm that I like to run for ...
# questions
i
I have a cluster algorithm that I like to run for N days separately. My idea is to do so in parallel. However it is not clear for me how I can manage parallelism uising kedro nodes in this case. Any thoughts?
m
Kedro comes with different runners to allow you to run your pipeline sequentially, in parallel or using threads: https://docs.kedro.org/en/stable/nodes_and_pipelines/run_a_pipeline.html#parallelrunner
i
My preference would be to create nodes dynamically, including parameters that are calculated at runtime rather than defined in the catalog. For instance, I want to run my algorithm for days 1, 2, and 3 in one execution, and for days 1, 2, 3, 4, and 5 in another execution. In the first execution, I would like to create three nodes, and in the second execution, I would like create five nodes. Then the runner would take care of parallelism across nodes I guess. Is this possible?
m
Dynamic node creation isn't something that comes out of the box with Kedro, because Kedro pipelines should be reproducible easily. But if you find a way of doing that, I think the
ParallelRunner
should still run whatever pipeline you've generated.
What do you mean by create 3 nodes in one execution and another five in another execution? When are the node actually computed, by executed do you mean create a node or computation? Maybe to rephrase the question, how would you do this equivalently in plain python (pseudo)code without kedro?
i
In python I would do something similar to this:
Copy code
days = list(range(1, settings.value))

pool = multiprocessing.Pool(5)

pool.map(run_algorithm, days)

pool.close()
I could run the same code within one kedro node I guess, but i thought that may be is better to use kedro features to manage parallelism instead.
n
If you know ahead what you gonna loop through, you can just create a static pipeline, basically what https://getindata.com/blog/kedro-dynamic-pipelines/ do
💡 1
i
Thanks