hey all, I know it's been asked many times but i am yet to find a solution on kedro node running order. I am building steps which creates some tables in bigquery (since the query is complex it is being done in a multi stage way so 01-query1.sql, 02.query2.sql etc. Each of these are a node in kedro but since my custom dataset implementation (creating tables in bigquery) only implemented a
method, i define outputs as
in the node. Question is how can I create a Ordered Pipeline in kedro? Im willing to hack the Pipeline class a bit but too many stuff going on there so seeking some help here. thanks in advance! 🙂
Yeah, I searched for this question but Im not clear how this would work with dummy outputs here. My dataset only implements
method (I thought it's suited for that one) and if i take a dummy input that means i have to implement
operation to create data dependency here, no? For instance:
node1(lambda x: x, inputs="create-table-1", outputs=None)
node2(lambda x: x, inputs="create-table-2", outputs=None)
How would that work? Sorry if I am missing something obvious here
Yeah, but you don’t have to save anything in the
, just pass some dummy data
You can have:
node1(lambda x: "not important", inputs="create-table-1", outputs="dummy")
node2(lambda x, *args: "dummy2", inputs=["create-table-2", "dummy"], outputs=None)
the “dummy” dataset from
can be skipped from the catalog if you’re using MemoryDataSet (which is default)
Ahh I see now, with multi inputs, didn't occur to me so far 😄 Thanks, will give that one a try. I think your example would be nice to have in the nodes documentation 👍 gratitude thank you
