Afiq Johari
01/18/2024, 2:24 PMcompanies:
type: pandas.CSVDataset
filepath: data/01_raw/companies.csv
reviews:
type: pandas.CSVDataset
filepath: data/01_raw/reviews.csv
shuttles:
type: pandas.ExcelDataset
filepath: data/01_raw/shuttles.xlsx
load_args:
engine: openpyxl
Given SpaceX and SpaceY, I would like to define spaceflight company as a parameter
And from the data catalog point of view, I hope to arrive at
companies:
type: pandas.CSVDataset
filepath: ${spaceflight_company}/data/01_raw/companies.csv
reviews:
type: pandas.CSVDataset
filepath: ${spaceflight_company}/data/01_raw/reviews.csv
shuttles:
type: pandas.ExcelDataset
filepath: ${spaceflight_company}/data/01_raw/shuttles.xlsx
load_args:
engine: openpyxl
This avoids me from duplicating multiples of data catalogs such as companies_SpaceX, companies_SpaceY, etc. This simplifies the data catalog.
Hence, for all the kedro nodes and pipelines, they will be dependent on which spaceflight company that I want to run.
So instead of
kedro run
kedro run --nodes=preprocess_companies_node,preprocess_shuttles_node
I hope to be able to specify which spaceflight company that I want to run, so it'd be like.
kedro run --spaceflightcompany=SpaceX
kedro run --spaceflightcompany=SpaceX --nodes=preprocess_companies_node,preprocess_shuttles_node
Ankita Katiyar
01/18/2024, 2:36 PMDeepyaman Datta
01/18/2024, 3:16 PMkedro run --namespace SpaceX
to run a namespaced pipeline. You can reuse a modular pipeline with these nodes.
Like @Ankita Katiyar said, you would use something like dataset factories to make sure you have a catalog entry for each namespace.