Hello Kedro community, I am currently developing ...
# questions
v
Hello Kedro community, I am currently developing a project where I need to pass in a dynamic number of catalog dataset entries as inputs to a node. The number of input datasets to this node depends on the primary input dataset being used , particularly the number of unique values in one field. For instance this node expects tree inputs: a column name (this is fixed and not dynamic), feature datasets ,target datasets. This node basically collates all these datasets together in one object as the output of the node- • The number of feature and target datasets is dynamic . Can be 1 or 20. They all have catalog entries . • I tried creating a list of catalog entry strings to be passed for the feature and target datasets as below-
Copy code
feature_df_list = [
    f"{group_name_cleaned}.features_with_clusters"
    for group_name_cleaned in groups_cleaned
]

target_df_list = [
    f"{group_name_cleaned}.target_with_clusters"
    for group_name_cleaned in groups_cleaned
]

input_dict = {
    "target_col": "params:target_col",
    "group_list": feature_df_list,
    "target_clusters_with_features": target_df_list,
}


node(
    func=collate_results,
    inputs=input_dict,
    outputs="run_collection",
),
• But it treats the catalog entries in the list as strings and does not load the datasets required with them Please help me in trying to understand ow best I can pass dynamic inputs to a node in Kedro :)
h
Someone will reply to you shortly. In the meantime, this might help:
r
hi, have you tried using parameters in Kedro ?
v
I am not sure how i would use that Rashida? I have the params and catalog file setup. How would that help me passing dynamic inputs to a node? If you could share an example tat would be great šŸ™‚
r
Not sure if this example is relevant in your case
Copy code
for namespace, variants in settings.DYNAMIC_PIPELINES_MAPPING.items():
        for variant in variants:
            pipes.append(
                pipeline(
                    data_science_pipeline,
                    inputs={"model_input_table": f"{namespace}.model_input_table"},
                    namespace=f"{namespace}.{variant}",
                    tags=[variant, namespace],
                )
            )
    return sum(pipes)
It's from this blog - https://getindata.com/blog/kedro-dynamic-pipelines/
šŸ™ 1
p
You probably want a preprocessing pipeline and create data according to your groups and use those as inputs. check out namespaces, helped me a lot with this and the blogpost mentioned by Rashida. Actually ended up implementing it.
šŸ‘ 1
v
Thank you both for pointing me to namespaces . Extremely helpful šŸ™Œ. I also want to create a node that collates output from all namespaces into one summary output. Is there a way to pass all outputs created by the dynamic namespaces to a single node which collates them?
For instance in the example Rashida shared which has base, candidate1 & candidate2 namespaces and regressor models for each. I want to create 1 node which takes the 3 ( this is dynamic) models created as input.
r
Hi @Vinayak Singh, I haven't tried this myself, but in principle, the outputs of a node can serve as inputs to another node. If you define your outputs correctly in the DataCatalog, you should be able to reference them as inputs in a new node.
šŸ‘ 1
p
Maybe you can build your input dict beforehand by reading settings.DYNAMIC_PIPELINES_MAPPING.items() ? so you can populate your inputs with all used namespaces/variants and read it by useing kwargs.
šŸ™Œ 1
v
thank you both for your responses . Great suggestion @Philipp Dahlke , i will try and do that.
m
If the inputs of the nodes is a dynamic list of catalog entries, those can be retrieved in the function by using `*args`:
Copy code
def node_function_receiving_multiple_inputs(*args):
    loaded_list = list(args)
    for data_input in loaded_list:
        # do stuff