I have a cluster algorithm that I like to run for N days sep Kedro #questions

I have a cluster algorithm that I like to run for ...

Ignasi Mañé

05/22/2024, 7:49 AM

I have a cluster algorithm that I like to run for N days separately. My idea is to do so in parallel. However it is not clear for me how I can manage parallelism uising kedro nodes in this case. Any thoughts?

Merel

05/22/2024, 8:05 AM

Kedro comes with different runners to allow you to run your pipeline sequentially, in parallel or using threads: https://docs.kedro.org/en/stable/nodes_and_pipelines/run_a_pipeline.html#parallelrunner

Ignasi Mañé

05/22/2024, 9:15 AM

My preference would be to create nodes dynamically, including parameters that are calculated at runtime rather than defined in the catalog. For instance, I want to run my algorithm for days 1, 2, and 3 in one execution, and for days 1, 2, 3, 4, and 5 in another execution. In the first execution, I would like to create three nodes, and in the second execution, I would like create five nodes. Then the runner would take care of parallelism across nodes I guess. Is this possible?

Merel

05/22/2024, 9:25 AM

Dynamic node creation isn't something that comes out of the box with Kedro, because Kedro pipelines should be reproducible easily. But if you find a way of doing that, I think the

ParallelRunner

should still run whatever pipeline you've generated.

Nok Lam Chan

05/22/2024, 9:31 AM

https://getindata.com/blog/kedro-dynamic-pipelines/

Nok Lam Chan

05/22/2024, 9:32 AM

What do you mean by create 3 nodes in one execution and another five in another execution? When are the node actually computed, by executed do you mean create a node or computation? Maybe to rephrase the question, how would you do this equivalently in plain python (pseudo)code without kedro?

Ignasi Mañé

05/22/2024, 10:34 AM

In python I would do something similar to this:

Copy code

days = list(range(1, settings.value))

pool = multiprocessing.Pool(5)

pool.map(run_algorithm, days)

pool.close()

I could run the same code within one kedro node I guess, but i thought that may be is better to use kedro features to manage parallelism instead.

Nok Lam Chan

05/22/2024, 10:42 AM

If you know ahead what you gonna loop through, you can just create a static pipeline, basically what https://getindata.com/blog/kedro-dynamic-pipelines/ do

💡 1

Ignasi Mañé

05/22/2024, 4:18 PM

Thanks

Open in Slack

Previous Next