hey all, I know it's been asked many times but i ...
# questions
f
hey all, I know it's been asked many times but i am yet to find a solution on kedro node running order. I am building steps which creates some tables in bigquery (since the query is complex it is being done in a multi stage way so 01-query1.sql, 02.query2.sql etc. Each of these are a node in kedro but since my custom dataset implementation (creating tables in bigquery) only implemented a
load
method, i define outputs as
None
in the node. Question is how can I create a Ordered Pipeline in kedro? Im willing to hack the Pipeline class a bit but too many stuff going on there so seeking some help here. thanks in advance! 🙂
f
Yeah, I searched for this question but Im not clear how this would work with dummy outputs here. My dataset only implements
load
method (I thought it's suited for that one) and if i take a dummy input that means i have to implement
save
operation to create data dependency here, no? For instance:
node1(lambda x: x, inputs="create-table-1", outputs=None)
node2(lambda x: x, inputs="create-table-2", outputs=None)
How would that work? Sorry if I am missing something obvious here
m
Yeah, but you don’t have to save anything in the
save
, just pass some dummy data
You can have:
Copy code
node1(lambda x: "not important", inputs="create-table-1", outputs="dummy")
node2(lambda x, *args: "dummy2", inputs=["create-table-2", "dummy"], outputs=None)
the “dummy” dataset from
outputs
can be skipped from the catalog if you’re using MemoryDataSet (which is default)
f
Ahh I see now, with multi inputs, didn't occur to me so far 😄 Thanks, will give that one a try. I think your example would be nice to have in the nodes documentation 👍 gratitude thank you
🎉 2