Hi everyone, Is there a way to dynamically set the...
# questions
r
Hi everyone, Is there a way to dynamically set the name of an output without setting manually the same outputs with variations on the catalog? Context: I've a pipeline that saves 15 different outputs that are defined in my catalog, but now I need to save each one of them by category as as
{category}_output_1.parquet
,
{category}_output_2.parquet
and so on... Any alternative suggestion is welcome 🙂
d
This is somewhere where hooks can help
but we often say that Kedro is designed with reproducibility in mind
so don’t necessarily encourage users to go down this route
r
Mmm yes I guess that I'll need to save the complete outputs and then add a non-kedro job to split them
d
you could do that in an
after_pipeline_run
hook
r
Ok I'll give a try, thanks!
i
@Rob From what I understand, this is changing very soon. But the jinja api might be helpful to ease the burden of defining your datasets in the catalog. Doesn’t solve the issue at run time though if that’s what you need. https://kedro.readthedocs.io/en/stable/kedro_project_setup/configuration.html#jinja2-support
r
That also looks as an smart alternative, not quite familiar yet Thanks Ian!
f
I think another approach could be to use a custom
TemplatedConfigLoader
, where you update the
globals_dict
with the
runtime_params
. This way, the parameters that you pass can be fed into the catalog definition. Then, in your
catalog.yml
, you can define your entry this way:
Copy code
output1:
    type: pandas.ParquetDataSet
    path: ${category:NOT_DEFINED}_output_1.parquet
Then when you run your pipeline;
kedro run --pipeline my_pipeline --params category:category1
, the path will be updated according to the passed parameter
I used a default value of
NOT_DEFINED
, but according to your use case, not having a default and failing when you do not provide the parameter could be a safer/better choice
r
I'll consider it, because I'll need to export the outputs for each one the categories present, not as parameters (it's a huge list) so for now I think my case can fit better in a hook Thanks @FlorianGD
s
Why is a regular partitioned dataset not applicable?