Kasper Janehag
09/04/2023, 12:27 PMfrom kedro.pipeline import Pipeline, node, pipeline
def mock_function(input_1, input_2):
return None
def create_pipeline(**kwargs) -> Pipeline:
return pipeline(
[
node(
func=mock_function,
inputs=["dataset_1", "dataset_2"],
outputs="dataset_3",
name="first_node",
),
node(
func=mock_function,
inputs=["dataset_3", "dataset_4"],
outputs="dataset_5",
name="second_node",
),
node(
func=mock_function,
inputs=["dataset_5", "dataset_6"],
outputs="dataset_7",
name="third_node",
namespace="namespace_prefix_1",
),
node(
func=mock_function,
inputs=["dataset_7", "dataset_8"],
outputs="dataset_9",
name="fourth_node",
namespace="namespace_prefix_1",
),
node(
func=mock_function,
inputs=["dataset_9", "dataset_10"],
outputs="dataset_11",
name="fifth_node",
namespace="namespace_prefix_1",
),
]
)
which looks like Screenshot 1 in expanded view and Screenshot 2 in collapsed view. When the namespace_prefix_1
-group is collapsed, dataset_9 and dataset_10 are visible (but not connect), even though they only exists within that namespace? Is this intentional? If so, how should I work around it?
Question 2
In my second example, if I have the same pipeline but dataset_3 is using the same prefix as the namespace group (namespace_prefix_1
):
from kedro.pipeline import Pipeline, node, pipeline
def mock_function(input_1, input_2):
return None
def create_pipeline(**kwargs) -> Pipeline:
return pipeline(
[
node(
func=mock_function,
inputs=["dataset_1", "dataset_2"],
outputs="namespace_prefix_1.dataset_3",
name="first_node",
),
node(
func=mock_function,
inputs=["namespace_prefix_1.dataset_3", "dataset_4"],
outputs="dataset_5",
name="second_node",
),
node(
func=mock_function,
inputs=["dataset_5", "dataset_6"],
outputs="dataset_7",
name="third_node",
namespace="namespace_prefix_1",
),
node(
func=mock_function,
inputs=["dataset_7", "dataset_8"],
outputs="dataset_9",
name="fourth_node",
namespace="namespace_prefix_1",
),
node(
func=mock_function,
inputs=["dataset_9", "dataset_10"],
outputs="dataset_11",
name="fifth_node",
namespace="namespace_prefix_1",
),
]
)
when the namespace_prefix_1
group is collapsed DataSet 3 disapears from the DAG (see screenshot 3), even though it's not meant to be part of that namespace (just using the same prefix) and breaks the DAG. Is this intentional? If so, how should I work around it?Juan Luis
09/04/2023, 4:56 PMRashida Kanchwala
09/04/2023, 10:30 PMKasper Janehag
09/05/2023, 6:28 AMRashida Kanchwala
09/05/2023, 7:42 AMnode(
func=mock_function,
inputs=["dataset_5", "dataset_6"],
outputs="dataset_7",
name="third_node",
namespace="namespace_prefix_1",
),
node(
func=mock_function,
inputs=["dataset_7", "dataset_8"],
outputs="dataset_9",
name="fourth_node",
namespace="namespace_prefix_1",
),
Third_node, and fourth_node belong to modular pipeline 'namespace_prefix_1' and that's why they are hidden in the first screenshot. Dataset_7 is an output of third_node which means it is an output of modular pipeline 'namespace_prefix_1'. Dataset_7 is also an input to fourth_node which means it is an input of modular pipeline 'namespace_prefix_1' .
This means it's both an input and output of that modular pipeline which creates the cycle.
This is a known issue like i said - https://github.com/kedro-org/kedro-viz/blob/main/package/kedro_viz/docs/assets/expand_collapse_modular_pipelines_presentation.pdf. You can read about in the doc I shared, towards the last few slides on Edge Cases. We would be very interested in solving this issue for our users but so far we haven't come up with the best way.