Hello Kedro community I am currently developing a project wh Kedro #questions

Hello Kedro community, I am currently developing ...

Vinayak Singh

01/13/2025, 2:16 PM

Hello Kedro community, I am currently developing a project where I need to pass in a dynamic number of catalog dataset entries as inputs to a node. The number of input datasets to this node depends on the primary input dataset being used , particularly the number of unique values in one field. For instance this node expects tree inputs: a column name (this is fixed and not dynamic), feature datasets ,target datasets. This node basically collates all these datasets together in one object as the output of the node- • The number of feature and target datasets is dynamic . Can be 1 or 20. They all have catalog entries . • I tried creating a list of catalog entry strings to be passed for the feature and target datasets as below-

Copy code

feature_df_list = [
    f"{group_name_cleaned}.features_with_clusters"
    for group_name_cleaned in groups_cleaned
]

target_df_list = [
    f"{group_name_cleaned}.target_with_clusters"
    for group_name_cleaned in groups_cleaned
]

input_dict = {
    "target_col": "params:target_col",
    "group_list": feature_df_list,
    "target_clusters_with_features": target_df_list,
}


node(
    func=collate_results,
    inputs=input_dict,
    outputs="run_collection",
),

• But it treats the catalog entries in the list as strings and does not load the datasets required with them Please help me in trying to understand ow best I can pass dynamic inputs to a node in Kedro :)

Hall

01/13/2025, 2:16 PM

Someone will reply to you shortly. In the meantime, this might help:

Rashida Kanchwala

01/13/2025, 2:18 PM

hi, have you tried using parameters in Kedro ?

Vinayak Singh

01/13/2025, 2:21 PM

I am not sure how i would use that Rashida? I have the params and catalog file setup. How would that help me passing dynamic inputs to a node? If you could share an example tat would be great 🙂

Rashida Kanchwala

01/13/2025, 2:45 PM

Not sure if this example is relevant in your case

Copy code

for namespace, variants in settings.DYNAMIC_PIPELINES_MAPPING.items():
        for variant in variants:
            pipes.append(
                pipeline(
                    data_science_pipeline,
                    inputs={"model_input_table": f"{namespace}.model_input_table"},
                    namespace=f"{namespace}.{variant}",
                    tags=[variant, namespace],
                )
            )
    return sum(pipes)

It's from this blog - https://getindata.com/blog/kedro-dynamic-pipelines/

🙏 1

Philipp Dahlke

01/13/2025, 3:36 PM

You probably want a preprocessing pipeline and create data according to your groups and use those as inputs. check out namespaces, helped me a lot with this and the blogpost mentioned by Rashida. Actually ended up implementing it.

👍 1

Vinayak Singh

01/14/2025, 11:47 AM

Thank you both for pointing me to namespaces . Extremely helpful 🙌. I also want to create a node that collates output from all namespaces into one summary output. Is there a way to pass all outputs created by the dynamic namespaces to a single node which collates them?

Vinayak Singh

01/14/2025, 11:57 AM

For instance in the example Rashida shared which has base, candidate1 & candidate2 namespaces and regressor models for each. I want to create 1 node which takes the 3 ( this is dynamic) models created as input.

Rashida Kanchwala

01/14/2025, 5:28 PM

Hi @Vinayak Singh, I haven't tried this myself, but in principle, the outputs of a node can serve as inputs to another node. If you define your outputs correctly in the DataCatalog, you should be able to reference them as inputs in a new node.

👍 1

Philipp Dahlke

01/14/2025, 5:51 PM

Maybe you can build your input dict beforehand by reading settings.DYNAMIC_PIPELINES_MAPPING.items() ? so you can populate your inputs with all used namespaces/variants and read it by useing kwargs.

🙌 1

Vinayak Singh

01/14/2025, 6:19 PM

thank you both for your responses . Great suggestion @Philipp Dahlke , i will try and do that.

Max Hoffmann

01/22/2025, 10:01 PM

If the inputs of the nodes is a dynamic list of catalog entries, those can be retrieved in the function by using `*args`:

Copy code

def node_function_receiving_multiple_inputs(*args):
    loaded_list = list(args)
    for data_input in loaded_list:
        # do stuff

4 Views

Open in Slack

Previous Next