Anjali Datta

03/22/2023, 1:00 AM
I’m inexperienced, so this is basic question. I’m trying to add datasets programmatically. I’ve made a file that contains:
from <|> import DataCatalog
from <|> import PartitionedDataSet
from kedro.extras.datasets.pandas import CSVDataSet
from kedro.config import ConfigLoader
conf_paths = ['conf/base', 'conf/local']
conf_loader = ConfigLoader(conf_paths)
atlas_regions = conf_loader.get('atlas_regions*') # A .yml file consisting of regions with names
catalog_dictionary = {}
for region in atlas_regions['regions']:
name = region['name']
# catalog_dictionary[f'{name}_data_right'] = PartitionedDataSet(path = '../ClinicalDTI/R_VIM/', \
#     dataset = '<|>.nifti.NIfTIDataSet', filename_suffix = f'seedmasks/{name}_R_T1.nii.gz')
catalog_dictionary[f'{name}_data_right'] = CSVDataSet(filepath = "../data/01_raw/iris.csv")
# catalog_dictionary[f'{name}_data_right_output'] = CSVDataSet(filepath = "../data/01_raw/iris.csv")
io = DataCatalog(catalog_dictionary)
(Kedro version 0.17.7) Running prints the expected list of datasets. But what do I need to do to be able to use these datasets in a pipeline?


03/22/2023, 7:53 AM
Hi @Anjali Datta - since you say you’re new to Kedro, I’d highly recommend you follow the tutorials since this approach isn’t the recommended approach.

Deepyaman Datta

03/22/2023, 10:31 AM
@datajoely I think this use case isn't covered by Spaceflights, because the data layout is complex (configurable set of regions,
for each region based on the region name, presumably want to run a pipeline for each region to create output for that region).
@Anjali Datta I've created a quick-and-dirty example of how you can have a dynamic catalog + pipelines using Jinja (see This is the diff on top of just creating a new project named "Jinja Example":[…]2cf4f017733483d62adbff5 Cons of this approach: • I've defined
in two places, because you can't use something like
inside Jinja. • In Kedro 0.19, I think a new OmegaConfLoader will be the preferred way to go, and I don't think that support Jinja. I'm not 100% sure how this use case would be best handled there. • Too much Jinja makes pipelines confusing (I think this use case for reused modular pipelines is fair, though). If you aren't familiar with namespacing/reuse of modular pipelines, see I can try and add an example of keeping the pipeline definition in Python and using with pipelines as an alternative, even though i don't think it's well documented P.S. I used Kedro 0.18.6, which includes some stuff like pipeline autodiscovery (; if you try to replicate with 0.17.7, you will need to add "data_processing" explicitly

Anjali Datta

03/24/2023, 3:57 AM
Thank you so much, @datajoely and @Deepyaman Datta!!