Hi I have two questions regarding unexpected behaviour of sh Kedro #questions

Hi. I have two questions regarding unexpected beh...

Kasper Janehag

09/04/2023, 12:27 PM

Hi. I have two questions regarding unexpected behaviour of showing and hiding datasets in Kedro Viz, when working with namespaced kedro pipelines. Question 1 In my first example, if I have this simple pipeline:

Copy code

from kedro.pipeline import Pipeline, node, pipeline


def mock_function(input_1, input_2):
    return None


def create_pipeline(**kwargs) -> Pipeline:
    return pipeline(
        [
            node(
                func=mock_function,
                inputs=["dataset_1", "dataset_2"],
                outputs="dataset_3",
                name="first_node",
            ),
            node(
                func=mock_function,
                inputs=["dataset_3", "dataset_4"],
                outputs="dataset_5",
                name="second_node",
            ),
            node(
                func=mock_function,
                inputs=["dataset_5", "dataset_6"],
                outputs="dataset_7",
                name="third_node",
                namespace="namespace_prefix_1",
            ),
            node(
                func=mock_function,
                inputs=["dataset_7", "dataset_8"],
                outputs="dataset_9",
                name="fourth_node",
                namespace="namespace_prefix_1",
            ),
            node(
                func=mock_function,
                inputs=["dataset_9", "dataset_10"],
                outputs="dataset_11",
                name="fifth_node",
                namespace="namespace_prefix_1",
            ),
        ]
    )

which looks like Screenshot 1 in expanded view and Screenshot 2 in collapsed view. When the

namespace_prefix_1

-group is collapsed, dataset_9 and dataset_10 are visible (but not connect), even though they only exists within that namespace? Is this intentional? If so, how should I work around it? Question 2 In my second example, if I have the same pipeline but dataset_3 is using the same prefix as the namespace group (

namespace_prefix_1

Copy code

from kedro.pipeline import Pipeline, node, pipeline


def mock_function(input_1, input_2):
    return None


def create_pipeline(**kwargs) -> Pipeline:
    return pipeline(
        [
            node(
                func=mock_function,
                inputs=["dataset_1", "dataset_2"],
                outputs="namespace_prefix_1.dataset_3",
                name="first_node",
            ),
            node(
                func=mock_function,
                inputs=["namespace_prefix_1.dataset_3", "dataset_4"],
                outputs="dataset_5",
                name="second_node",
            ),
            node(
                func=mock_function,
                inputs=["dataset_5", "dataset_6"],
                outputs="dataset_7",
                name="third_node",
                namespace="namespace_prefix_1",
            ),
            node(
                func=mock_function,
                inputs=["dataset_7", "dataset_8"],
                outputs="dataset_9",
                name="fourth_node",
                namespace="namespace_prefix_1",
            ),
            node(
                func=mock_function,
                inputs=["dataset_9", "dataset_10"],
                outputs="dataset_11",
                name="fifth_node",
                namespace="namespace_prefix_1",
            ),
        ]
    )

when the

namespace_prefix_1

group is collapsed DataSet 3 disapears from the DAG (see screenshot 3), even though it's not meant to be part of that namespace (just using the same prefix) and breaks the DAG. Is this intentional? If so, how should I work around it?

👍 1

Kasper Janehag

09/04/2023, 1:39 PM

@Juan Luis Cano any ideas?

Juan Luis

09/04/2023, 4:56 PM

@Ravi Kumar Pilla @Rashida Kanchwala this issue rings a bell, any clue of what might be happening here?

Rashida Kanchwala

09/04/2023, 10:30 PM

ok so a couple of things. 1. Regarding the first question - It is a known issue. Kedro-viz is a DAG and the introduction of modular pipelines in Kedro-viz has sometimes resulted in cyclic dependencies. Specifically, when you collapse the modular pipeline in your scenario, both Dataset 7 and 9 act as inputs and outputs. This creates a cyclic dependency that disrupts the DAG, which is why we intentionally remove those links. While this is an edge case, I believe there's room for improvement. Currently, it's not clear to users and could appear broken. What would be your expectations or suggestions for this? 2. For the second issue - the prefix of Dataset3 suggests it's part of that modular pipeline. It isn't an input or output but is an internal dataset of the modular pipeline. That's why you see Dataset3 when "namespace_prefix_1" is expanded, but it disappears once you collapse it. hope that helps.

Kasper Janehag

09/05/2023, 6:28 AM

Hi @Rashida Kanchwala, thanks for reaching out! 🙂 Regarding 1. Dataset 7 and 9 are not cyclic dependencies, see first page. They are only intermediate datasets within a modular pipelines. I agree with you that they're both input and output datasets, but that's true for DataSet 3 as well. Regarding 2. So your suggestion on this part is simply to not use same prefix if it's not actually part of that pipeline module?

Kasper Janehag

09/05/2023, 6:28 AM

@Ankar Yadav

Rashida Kanchwala

09/05/2023, 7:42 AM

technically they are not. but the modular pipeline ('namespace_prefix_1' ) introduces this cycle

Copy code

node(
                func=mock_function,
                inputs=["dataset_5", "dataset_6"],
                outputs="dataset_7",
                name="third_node",
                namespace="namespace_prefix_1",
            ),
            node(
                func=mock_function,
                inputs=["dataset_7", "dataset_8"],
                outputs="dataset_9",
                name="fourth_node",
                namespace="namespace_prefix_1",
            ),

Third_node, and fourth_node belong to modular pipeline 'namespace_prefix_1' and that's why they are hidden in the first screenshot. Dataset_7 is an output of third_node which means it is an output of modular pipeline 'namespace_prefix_1'. Dataset_7 is also an input to fourth_node which means it is an input of modular pipeline 'namespace_prefix_1' . This means it's both an input and output of that modular pipeline which creates the cycle. This is a known issue like i said - https://github.com/kedro-org/kedro-viz/blob/main/package/kedro_viz/docs/assets/expand_collapse_modular_pipelines_presentation.pdf. You can read about in the doc I shared, towards the last few slides on Edge Cases. We would be very interested in solving this issue for our users but so far we haven't come up with the best way.

Rashida Kanchwala

09/05/2023, 7:43 AM

Regarding 2, yes if it's not part of modular pipeline, then we shouldn't add the prefix.

Rashida Kanchwala

09/05/2023, 7:44 AM

on second thoughts if Dataset 7 and Dataset 9 are intermediary datasets within a modular pipeline then they should have a prefix like you have for Dataset_3 and they will be seen only when you expand that modular pipeline not in the collapsed view.

8 Views

Open in Slack

Previous Next