Hello kedro team! I have a kedro issue, let's see ...
# questions
Hello kedro team! I have a kedro issue, let's see if you can help me... We have a kedro pipeline that trains a model and generates a dataframe as output. The problem we now have is that we need to loop that pipeline to generate multiple dataframes (that, at the end, we want to concatenate to have a single table). Is possible to, given a parameter of
set_targets = ['a', 'b', 'c']
, we can loop the same pipeline for each value of that list without "copying" that pipeline? We may have a different length and names for that "`set_of_targets`", and thus we want to avoid manual work... Also, we need the outputs to have "dynamic" names in the catalog in order to save all the outputs (
)... I think this could be done with
, but no idea where to start... Thank you very much!
You can simply reuse the same pipeline and provide alias to the data from your catalogue. https://kedro.readthedocs.io/en/0.17.6/06_nodes_and_pipelines/03_modular_pipelines.html#how-to-use-a-modular-pipeline-twice
Iteration for pipeline is not difficult. You can simply iterate many times you want. Just keep on changing these alias. Make sure you add these data values as part of catalogue.
Thank you for your help @Shubham Gupta! Although it
is a tool to keep in mind, I think that would fit my problem if I would know the set of targets through which I have to iterate and run the pipeline "beforehand". The problem is that I may not have that set of targets, and that I cannot add the
namespace parameters and catalog entries
until I have the set (which may be different from one project to another). The PO would like to have a "dynamic template", that only changing a
set_of_targets parameter
, it would automatically create those
I think there are 2 questions asked here. 1. How to avoid naming repeating datasets? Namespace (Modular) pipeline is the right thing to do here. 2. Dynamic pipeline is not really encourage here, but it’s not impossible to do
👍 1