Afiq Johari
04/25/2024, 6:58 AMDmitry Sorokin
04/25/2024, 8:12 AMAfiq Johari
04/25/2024, 8:12 AMdatajoely
04/25/2024, 8:18 AMdatajoely
04/25/2024, 8:18 AMAfiq Johari
04/25/2024, 8:19 AMdatajoely
04/25/2024, 8:20 AMAfiq Johari
04/25/2024, 8:22 AMdatajoely
04/25/2024, 8:24 AMAfiq Johari
04/25/2024, 8:29 AMdatajoely
04/25/2024, 8:30 AMIñigo Hidalgo
04/25/2024, 9:11 AMdef node_fun(list_of_inputs, **kwargs):
if len(list_of_inputs)==1:
original_logic
else:
new_logic
Then what I have is a previous node which takes an arbitrary number of inputs and returns it as a list
def gather_args_into_list(*args):
"""Utility function which gathers all arguments into a list. Useful to combine multiple kedro node outputs into
a single list.
"""
return args
Then you would have
Pipeline(
[
node(
func=gather_args_into_list,
inputs=[
"any",
"number",
"of",
"inputs",
],
outputs="list_of_inputs",
),
node(
func=node_fun,
inputs="list_of_inputs",
outputs="output",
),
]
)
This allows for a variable number of inputs in the first node without needing to change the pipeline structure. It's a little bit awkward, but I would strongly encourage you to avoid using the catalog to load data manually within a pipeline, as that will make maintenance SO much more annoying down the line. I speak from experience as I currently maintain various projects where we do this, and it's impossible to keep track of where data is being loaded.