Hi team hope you re well qq I followed the <https getindata Kedro #questions

Hi team, hope you're well! qq, I followed the <dyn...

Ismail Ahmady

12/20/2023, 2:17 PM

Hi team, hope you're well! qq, I followed the dynamic pipelines tutorial to create a dynamic model training/evaluation suite using Kedro. I now would like to retrieve all the evaluation files within the evaluation/ folder, pass them as an input of a node so I can get one consolidated output, that recaps all the evaluation files that I have per model. Any idea how I can achieve this using Kedro? I thought of passing an entire directory as argument to a new node, but not sure how to achieve/if it's optimal. Thanks for your help 😄

marrrcin

12/20/2023, 2:25 PM

I think it would be better to output evaluation datasets from each sub-pipeline and then pass it as an input to an “aggregator” node, which will combine them and save

Ismail Ahmady

12/20/2023, 2:34 PM

hey, thanks a lot! couple of follow up questions: • are you suggesting to have the aggregator within the

for namespace, variants in settings.DYNAMIC_PIPELINES_MAPPING.items()

loop? • if so, that means I would have sort of a recurring in/output that gets updated at each loop? Not sure how that'd look like

marrrcin

12/20/2023, 2:38 PM

Copy code

def aggregator_pipeline():
    p = lambda version: pipeline(
        [
            node(
                func=lambda: print(version) or 666,
                inputs=None,
                outputs="metrics",
                name="calculate_metrics",
            )
        ]
    )

    root_namespace = "model_training"
    versions = ["v1", "v2", "v3"]
    pipes = []
    for version in versions:
        pipes.append(pipeline(p(version), namespace=f"{root_namespace}.{version}"))

    aggregation = pipeline(
        [
            node(
                func=lambda *args: print(args) or sum(args),
                inputs=[f"{root_namespace}.{version}.metrics" for version in versions],
                outputs="aggregated_metrics",
                name="aggregate_metrics",
            )
        ]
    )
    return sum(pipes) + aggregation

Ismail Ahmady

12/20/2023, 5:12 PM

works, thank you 🦠

😎 1

Julian Nowak

02/09/2024, 10:18 AM

Hi @Ismail Ahmady @marrrcin I'm working on similar problem, maybe you figured this out. Let's say I want to run each sub_pipeline (generated automatically) from a node, .e.g, from

Copy code

train_model_node

Do I have to precise manually then each sub_pipeline, or is there an automated way to do this? For example, in my case I get the message:

You can resume the pipeline run from the nearest nodes with persisted inputs by adding the following argument to your previous command:

--from-nodes

"sap_number.123.categorize_fails,sap_number.234.categorize_fails,sap_number.345.categorize_fails [...]

➖ 1

2 Views

Open in Slack

Previous Next