Hi team, hope you're well! qq, I followed the <dyn...
# questions
i
Hi team, hope you're well! qq, I followed the dynamic pipelines tutorial to create a dynamic model training/evaluation suite using Kedro. I now would like to retrieve all the evaluation files within the evaluation/ folder, pass them as an input of a node so I can get one consolidated output, that recaps all the evaluation files that I have per model. Any idea how I can achieve this using Kedro? I thought of passing an entire directory as argument to a new node, but not sure how to achieve/if it's optimal. Thanks for your help 😄
m
I think it would be better to output evaluation datasets from each sub-pipeline and then pass it as an input to an “aggregator” node, which will combine them and save
i
hey, thanks a lot! couple of follow up questions: • are you suggesting to have the aggregator within the
for namespace, variants in settings.DYNAMIC_PIPELINES_MAPPING.items()
loop? • if so, that means I would have sort of a recurring in/output that gets updated at each loop? Not sure how that'd look like
m
Copy code
def aggregator_pipeline():
    p = lambda version: pipeline(
        [
            node(
                func=lambda: print(version) or 666,
                inputs=None,
                outputs="metrics",
                name="calculate_metrics",
            )
        ]
    )

    root_namespace = "model_training"
    versions = ["v1", "v2", "v3"]
    pipes = []
    for version in versions:
        pipes.append(pipeline(p(version), namespace=f"{root_namespace}.{version}"))

    aggregation = pipeline(
        [
            node(
                func=lambda *args: print(args) or sum(args),
                inputs=[f"{root_namespace}.{version}.metrics" for version in versions],
                outputs="aggregated_metrics",
                name="aggregate_metrics",
            )
        ]
    )
    return sum(pipes) + aggregation
i
works, thank you 🦠
😎 1
j
Hi @Ismail Ahmady @marrrcin I'm working on similar problem, maybe you figured this out. Let's say I want to run each sub_pipeline (generated automatically) from a node, .e.g, from
Copy code
train_model_node
Do I have to precise manually then each sub_pipeline, or is there an automated way to do this? For example, in my case I get the message:
You can resume the pipeline run from the nearest nodes with persisted inputs by adding the following argument to your previous command:
--from-nodes
"sap_number.123.categorize_fails,sap_number.234.categorize_fails,sap_number.345.categorize_fails [...]
1