Andrej Zachar
03/20/2023, 1:36 AMnode(
first_namespace_fn,
inputs=["some_input"],
outputs="shared_name_so_it_can_reused_somewhere_else",
namespace="first"
),
node(
second_namespace_fn,
inputs=None,
outputs="shared_name_so_it_can_reused_somewhere_else",
namespace="second"
),
node(
third_common_fn,
inputs='shared_name_so_it_can_reused_somewhere_else',
outputs="final_output",
),
Thank you!Deepyaman Datta
03/20/2023, 1:58 AMnamespace
on a node; I don't think it does any input remapping. I assume what you're saying in this case is that you will either run the first and third node or the second and third node, but not first and second in the same run? In that case, it should be possible.
When using a modular pipeline multiple times, you can use a mapping dictionary to say that a catalog entry shouldn't get namespaced. For example:
cook_breakfast_pipeline = pipeline(
[
node(func=defrost, inputs="frozen_potatoes", outputs="veg", name="defrost_node"),
node(func=sauté, inputs="veg", outputs="breakfast_potatoes"),
]
)
cook_lunch_pipeline = pipeline(
[
node(func=defrost, inputs="frozen_carrots", outputs="veg", name="defrost_node"),
node(func=blanch, inputs="veg", outputs="cooked_veggies"),
]
)
eat_veggies = pipeline(
[
node(func=nom, inputs="cooked_veggies", outputs="leftovers", name="consume")
]
)
# Run either of the below pipelines, not both at once
eat_breakfast_pipeline = pipeline(
pipe=cook_breakfast_pipeline,
outputs={"breakfast_potatoes": "cooked_veggies"},
namespace="breakfast",
)
eat_lunch_pipeline = pipeline(
pipe=cook_lunch_pipeline,
outputs="cooked_veggies", # Alternatively, `outputs={"cooked_veggies": "cooked_veggies"},`
namespace="lunch",
)
from kedro.pipeline import pipeline, node
# Make dummy nodes
defrost = lambda x:x
sauté = lambda x:x
blanch = lambda x:x
nom = lambda x:x
# Copy/paste above code here
...
# Then if you run:
>>> eat_breakfast_pipeline.inputs()
{'breakfast.frozen_potatoes'}
>>> eat_breakfast_pipeline.outputs()
{'cooked_veggies'}
>>> eat_lunch_pipeline.inputs()
{'lunch.frozen_carrots'}
>>> eat_lunch_pipeline.outputs()
{'cooked_veggies'}
I this what you wanted?Andrej Zachar
03/20/2023, 8:52 AMDeepyaman Datta
03/20/2023, 1:41 PMkedro run --tag breakfast
.Andrej Zachar
03/20/2023, 1:43 PMcooked_veggies
If do so, then it says:
Output(s) ['cooked_veggies'] are returned by more than one nodes. Node outputs must be unique.
Also, I am not sure how to declare it so find_pipelines it is going to pick it out automatically.
Thank you again for your help @Deepyaman DattaDeepyaman Datta
03/20/2023, 5:48 PMHow can I create a pipeline that would be a combination of:
eat_breakfast_pipeline + eat_lunch_pipeline + clean_kitchen
where clean_kitchen pipeline would expect to get ['cooked_veggies'] as input and depending on the namespace or tag that was applied to eat_lunch_pipeline or eat_breakfast_pipeline it would procudecooked_veggies
If do so, then it says:
You can't do this, because you're creating an invalid Kedro pipeline (as mentioned in that error). Even in my example, I wrote:Copy codeOutput(s) ['cooked_veggies'] are returned by more than one nodes. Node outputs must be unique.
# Run either of the below pipelines, not both at once
Andrej Zachar
03/20/2023, 5:55 PMDeepyaman Datta
03/20/2023, 6:02 PMAnd do you plan to support such a sceniario?Not that I know of; allowing two nodes in the same pipeline to write to a single output causes undeterministic behavior. It is possible to get this same behavior by defining pipelines separately, so that seems to be the way to go.
Also, it would be cool that have one example project with tags and namespaces, so i do not need to bother you here;)Can you create an issue on github.com/kedro-org/kedro/issues to request this? That way, the broader team can decide if they will add this and how to structure the docs.
Andrej Zachar
03/20/2023, 6:25 PM