Hello everyone! Does anyone know how to pass in list of dataframes as an input in the pipeline node for Kedro? Because I have a function that takes in list of dataframes but doesn’t seem like it’s straightforward to implement
Nodes can take single inputs as well as lists, you’ll just have to specify that your node in put is a list.
Can you function just take a *args?, or can you elaborate a bit what’s the problem since you can definite a list of inputs in pipeline.
@Merel @Nok Lam Chan Sorry I think I wasn’t clear enough! I’m trying to to something like below in the node:
inputs=[[list of dataframes]]
because my function looks like this:
def f(dataframes: List[DataFrame]) -> DataFrame:
Because it’s currently returning an unhashable list error
I see. In this case you may wrap a thin node function to construct however you like. Inside the node function you will call your function f instead. @datajoely is this the common way to do so? I can’t remember is there a good reason why can’t we resolve the inputs as a list/dict/tuple of something to match the function signature exactly.
so this should work
we know it doesn’t work if you try and map *args in modular pipeliens
but I accept *args here
Thank you @datajoely for sharing! Managed to fix the issue :)
There is a different approach for this that may help: use incremental datasets ☺️ you may create many dataframes as output in a node and they will be considered as only one input! Same for input 👌
i think creating a custom datacatalog entry may also work in this case? define you load and save functionalities as desired to treat the list of dataframes
