Hi everyone Is there a way to dynamically set the name of an Kedro #questions

Hi everyone, Is there a way to dynamically set the...

Rob

02/13/2023, 4:52 PM

Hi everyone, Is there a way to dynamically set the name of an output without setting manually the same outputs with variations on the catalog? Context: I've a pipeline that saves 15 different outputs that are defined in my catalog, but now I need to save each one of them by category as as

{category}_output_1.parquet

{category}_output_2.parquet

and so on... Any alternative suggestion is welcome 🙂

datajoely

02/13/2023, 4:56 PM

This is somewhere where hooks can help

datajoely

02/13/2023, 4:57 PM

but we often say that Kedro is designed with reproducibility in mind

datajoely

02/13/2023, 4:57 PM

so don’t necessarily encourage users to go down this route

Rob

02/13/2023, 5:00 PM

Mmm yes I guess that I'll need to save the complete outputs and then add a non-kedro job to split them

datajoely

02/13/2023, 5:01 PM

you could do that in an

after_pipeline_run

hook

Rob

02/13/2023, 5:02 PM

Ok I'll give a try, thanks!

Ian Whalen

02/13/2023, 5:06 PM

@Rob From what I understand, this is changing very soon. But the jinja api might be helpful to ease the burden of defining your datasets in the catalog. Doesn’t solve the issue at run time though if that’s what you need. https://kedro.readthedocs.io/en/stable/kedro_project_setup/configuration.html#jinja2-support

Rob

02/13/2023, 5:11 PM

That also looks as an smart alternative, not quite familiar yet Thanks Ian!

FlorianGD

02/13/2023, 5:19 PM

I think another approach could be to use a custom

TemplatedConfigLoader

, where you update the

globals_dict

with the

runtime_params

. This way, the parameters that you pass can be fed into the catalog definition. Then, in your

catalog.yml

, you can define your entry this way:

Copy code

output1:
    type: pandas.ParquetDataSet
    path: ${category:NOT_DEFINED}_output_1.parquet

Then when you run your pipeline;

kedro run --pipeline my_pipeline --params category:category1

, the path will be updated according to the passed parameter

FlorianGD

02/13/2023, 5:22 PM

I used a default value of

NOT_DEFINED

, but according to your use case, not having a default and failing when you do not provide the parameter could be a safer/better choice

Rob

02/13/2023, 5:30 PM

I'll consider it, because I'll need to export the outputs for each one the categories present, not as parameters (it's a huge list) so for now I think my case can fit better in a hook Thanks @FlorianGD

Sebastian Pehle

02/14/2023, 7:13 PM

Why is a regular partitioned dataset not applicable?

7 Views

Open in Slack

Previous Next