https://kedro.org/ logo
#questions
Title
# questions
a

Afiq Johari

01/18/2024, 2:24 PM
Hi, Using the spaceflights as an example, let's say I have two different spaceflight companies, SpaceX and SpaceY In the spaceflight example, we have the following data catalog.
Copy code
companies:
  type: pandas.CSVDataset
  filepath: data/01_raw/companies.csv

reviews:
  type: pandas.CSVDataset
  filepath: data/01_raw/reviews.csv

shuttles:
  type: pandas.ExcelDataset
  filepath: data/01_raw/shuttles.xlsx
  load_args:
    engine: openpyxl
Given SpaceX and SpaceY, I would like to define spaceflight company as a parameter And from the data catalog point of view, I hope to arrive at
Copy code
companies:
  type: pandas.CSVDataset
  filepath: ${spaceflight_company}/data/01_raw/companies.csv

reviews:
  type: pandas.CSVDataset
  filepath: ${spaceflight_company}/data/01_raw/reviews.csv

shuttles:
  type: pandas.ExcelDataset
  filepath: ${spaceflight_company}/data/01_raw/shuttles.xlsx
  load_args:
    engine: openpyxl
This avoids me from duplicating multiples of data catalogs such as companies_SpaceX, companies_SpaceY, etc. This simplifies the data catalog. Hence, for all the kedro nodes and pipelines, they will be dependent on which spaceflight company that I want to run. So instead of
Copy code
kedro run
kedro run --nodes=preprocess_companies_node,preprocess_shuttles_node
I hope to be able to specify which spaceflight company that I want to run, so it'd be like.
Copy code
kedro run --spaceflightcompany=SpaceX
kedro run --spaceflightcompany=SpaceX --nodes=preprocess_companies_node,preprocess_shuttles_node
1
a

Ankita Katiyar

01/18/2024, 2:36 PM
Hey Afiq, check out the dataset factories feature which might be useful - https://docs.kedro.org/en/stable/data/kedro_dataset_factories.html
👍 2
d

Deepyaman Datta

01/18/2024, 3:16 PM
Also see https://docs.kedro.org/en/stable/development/commands_reference.html#modifying-a-kedro-run; you can
kedro run --namespace SpaceX
to run a namespaced pipeline. You can reuse a modular pipeline with these nodes. Like @Ankita Katiyar said, you would use something like dataset factories to make sure you have a catalog entry for each namespace.
👍 2