https://kedro.org/ logo
#questions
Title
# questions
r

Rob

02/13/2023, 4:52 PM
Hi everyone, Is there a way to dynamically set the name of an output without setting manually the same outputs with variations on the catalog? Context: I've a pipeline that saves 15 different outputs that are defined in my catalog, but now I need to save each one of them by category as as
{category}_output_1.parquet
,
{category}_output_2.parquet
and so on... Any alternative suggestion is welcome 🙂
d

datajoely

02/13/2023, 4:56 PM
This is somewhere where hooks can help
but we often say that Kedro is designed with reproducibility in mind
so don’t necessarily encourage users to go down this route
r

Rob

02/13/2023, 5:00 PM
Mmm yes I guess that I'll need to save the complete outputs and then add a non-kedro job to split them
d

datajoely

02/13/2023, 5:01 PM
you could do that in an
after_pipeline_run
hook
r

Rob

02/13/2023, 5:02 PM
Ok I'll give a try, thanks!
i

Ian Whalen

02/13/2023, 5:06 PM
@Rob From what I understand, this is changing very soon. But the jinja api might be helpful to ease the burden of defining your datasets in the catalog. Doesn’t solve the issue at run time though if that’s what you need. https://kedro.readthedocs.io/en/stable/kedro_project_setup/configuration.html#jinja2-support
r

Rob

02/13/2023, 5:11 PM
That also looks as an smart alternative, not quite familiar yet Thanks Ian!
f

FlorianGD

02/13/2023, 5:19 PM
I think another approach could be to use a custom
TemplatedConfigLoader
, where you update the
globals_dict
with the
runtime_params
. This way, the parameters that you pass can be fed into the catalog definition. Then, in your
catalog.yml
, you can define your entry this way:
Copy code
output1:
    type: pandas.ParquetDataSet
    path: ${category:NOT_DEFINED}_output_1.parquet
Then when you run your pipeline;
kedro run --pipeline my_pipeline --params category:category1
, the path will be updated according to the passed parameter
I used a default value of
NOT_DEFINED
, but according to your use case, not having a default and failing when you do not provide the parameter could be a safer/better choice
r

Rob

02/13/2023, 5:30 PM
I'll consider it, because I'll need to export the outputs for each one the categories present, not as parameters (it's a huge list) so for now I think my case can fit better in a hook Thanks @FlorianGD
s

Sebastian Pehle

02/14/2023, 7:13 PM
Why is a regular partitioned dataset not applicable?
3 Views