Anjali Datta
03/22/2023, 1:00 AMfrom <http://kedro.io|kedro.io> import DataCatalog
from <http://kedro.io|kedro.io> import PartitionedDataSet
from kedro.extras.datasets.pandas import CSVDataSet
from kedro.config import ConfigLoader
conf_paths = ['conf/base', 'conf/local']
conf_loader = ConfigLoader(conf_paths)
atlas_regions = conf_loader.get('atlas_regions*') # A .yml file consisting of regions with names
catalog_dictionary = {}
for region in atlas_regions['regions']:
name = region['name']
# catalog_dictionary[f'{name}_data_right'] = PartitionedDataSet(path = '../ClinicalDTI/R_VIM/', \
# dataset = '<http://programmatic_datasets.io|programmatic_datasets.io>.nifti.NIfTIDataSet', filename_suffix = f'seedmasks/{name}_R_T1.nii.gz')
catalog_dictionary[f'{name}_data_right'] = CSVDataSet(filepath = "../data/01_raw/iris.csv")
# catalog_dictionary[f'{name}_data_right_output'] = CSVDataSet(filepath = "../data/01_raw/iris.csv")
io = DataCatalog(catalog_dictionary)
print(io.list())
(Kedro version 0.17.7)
Running catalog.py prints the expected list of datasets. But what do I need to do to be able to use these datasets in a pipeline?datajoely
03/22/2023, 7:53 AMDeepyaman Datta
03/22/2023, 10:31 AMPartitionedDataSet
for each region based on the region name, presumably want to run a pipeline for each region to create output for that region).regions
in two places, because you can't use something like globals.yml
inside Jinja.
• In Kedro 0.19, I think a new OmegaConfLoader will be the preferred way to go, and I don't think that support Jinja. I'm not 100% sure how this use case would be best handled there.
• Too much Jinja makes pipelines confusing (I think this use case for reused modular pipelines is fair, though).
If you aren't familiar with namespacing/reuse of modular pipelines, see https://docs.kedro.org/en/stable/nodes_and_pipelines/modular_pipelines.html#using-the-modular-pipeline-wrapper-to-provide-overrides
I can try and add an example of keeping the pipeline definition in Python and using with pipelines as an alternative, even though i don't think it's well documented
P.S. I used Kedro 0.18.6, which includes some stuff like pipeline autodiscovery (https://docs.kedro.org/en/stable/nodes_and_pipelines/pipeline_registry.html#pipeline-autodiscovery); if you try to replicate with 0.17.7, you will need to add "data_processing" explicitlyAnjali Datta
03/24/2023, 3:57 AM