I would like to run a series of 3 pipelines individually for Kedro #questions

I would like to run a series of 3 pipelines indivi...

Rickard Ström

12/14/2022, 6:06 PM

I would like to run a series of 3 pipelines individually for 30+ different input datasets and save the result from each individually too. What is the recommended way to do this? Would a combination of using hooks to register the datasets in the catalog with a loop to register the pipelines in the pipeline_registry.py as discussed here work? https://kedro-org.slack.com/archives/C03RKP2LW64/p1667919274697439?thread_ts=1667910377.475889&cid=C03RKP2LW64 Or should I play with the environment? tagging team member @Adrien Couetoux 👋🙏

Rickard Ström

12/14/2022, 7:26 PM

The hooks work well to register the catalog entries, but I'm still struggling to read the dataset id from the params when set as command line argument. I can see a bit of discussion about this in the treads above... did anyone find a solution?

Jordan

12/15/2022, 10:41 AM

I haven’t yet managed to work out the “correct” way of doing this. I’m still using the hacky solution of running a loop in the registry 😅

datajoely

12/15/2022, 10:50 AM

currently this is the best way

Rickard Ström

12/15/2022, 11:02 AM

@Jordan but how did you fetch the command line param into this function? Or are you running the pipelines for all datasets at the same time?

Jordan

12/15/2022, 11:18 AM

I just created a

run.py

file in the project root:

Copy code

import re
import yaml
from yaml.loader import SafeLoader
import subprocess

with open("./conf/base/parameters.yml") as f:
    datasets = yaml.load(f, Loader=SafeLoader)["datasets"]

for dataset in datasets:
    start_date = re.findall("(\d+)", dataset)[0]
    subprocess.run(
        f"kedro run --pipeline {start_date}_execution", shell=True, check=True
    )

It ain’t pretty, but it gets the job done.

Jordan

12/15/2022, 11:19 AM

I’m just wrapping up the documentation for this project, I’ll make the repo public and share the link when I’m done if it helps to understand what I’m doing

Rickard Ström

12/15/2022, 11:20 AM

Great, thanks!

Jordan

12/15/2022, 4:12 PM

Here’s the repo: https://github.com/beatslikeahelix/sunspot_segmentation

🥳 1

6 Views

Open in Slack

Previous Next