https://kedro.org/ logo
#questions
Title
# questions
r

Rickard Ström

12/14/2022, 6:06 PM
I would like to run a series of 3 pipelines individually for 30+ different input datasets and save the result from each individually too. What is the recommended way to do this? Would a combination of using hooks to register the datasets in the catalog with a loop to register the pipelines in the pipeline_registry.py as discussed here work? https://kedro-org.slack.com/archives/C03RKP2LW64/p1667919274697439?thread_ts=1667910377.475889&cid=C03RKP2LW64 Or should I play with the environment? tagging team member @Adrien Couetoux 👋🙏
The hooks work well to register the catalog entries, but I'm still struggling to read the dataset id from the params when set as command line argument. I can see a bit of discussion about this in the treads above... did anyone find a solution?
j

Jordan

12/15/2022, 10:41 AM
I haven’t yet managed to work out the “correct” way of doing this. I’m still using the hacky solution of running a loop in the registry 😅
d

datajoely

12/15/2022, 10:50 AM
currently this is the best way
r

Rickard Ström

12/15/2022, 11:02 AM
@Jordan but how did you fetch the command line param into this function? Or are you running the pipelines for all datasets at the same time?
j

Jordan

12/15/2022, 11:18 AM
I just created a
run.py
file in the project root:
Copy code
import re
import yaml
from yaml.loader import SafeLoader
import subprocess

with open("./conf/base/parameters.yml") as f:
    datasets = yaml.load(f, Loader=SafeLoader)["datasets"]

for dataset in datasets:
    start_date = re.findall("(\d+)", dataset)[0]
    subprocess.run(
        f"kedro run --pipeline {start_date}_execution", shell=True, check=True
    )
It ain’t pretty, but it gets the job done.
I’m just wrapping up the documentation for this project, I’ll make the repo public and share the link when I’m done if it helps to understand what I’m doing
r

Rickard Ström

12/15/2022, 11:20 AM
Great, thanks!
j

Jordan

12/15/2022, 4:12 PM
4 Views