Hi all, I get this weird bug with kedro viz: i h...
# questions
h
Hi all, I get this weird bug with kedro viz: i have this set of nested modular_pipelines in order to make my training/test and finetuning pipelines completely dry for different languages. to that end, i mapped the train and test splits throughout my pipelines to make i namespace them at the last moment. But when i visualize i see these two unconnected artifacts, Test and Train:
Which is weird, because when i open the nested pipelines, these disappear, and can only be found in this state by refreshing the browser:
but you can see that the train and test datasets are represented twice in visualisation
so if i hide it for example:
i am decently sure this is not an issue with my code, but is very difficult to debug
n
Are you using namespace?
h
yes
not everywhere though
n
viz used the
.
notation to detect if something is a namespace pipeline
did you use it for something which doesn’t belong to a namespace?
h
no
so basically, train and test are namespaced
and then i namespace per language
e.g finnish
n
can you share your catalog?
delete
h
so train and test are both memorydatasets
not defined in the catalog (because thats pretty difficult with namespacing atm)
n
how is it defined in your pipeline?
h
Copy code
full_pipeline = train_test_pipeline + train_pipeline + evaluation_pipeline

    finnish_pipeline = modular_pipeline(
        pipe=full_pipeline,
        inputs={"filtered_n_validated_data": "finnish_jobs"},
        outputs={},
        namespace="finnish",
    )
this full_pipeline is going to be re-used several times
within different namespaces
n
so train and test are both memorydatasets
Can you show the pipeline for this?
h
Copy code
evaluation_pipeline = modular_pipeline(
        pipe=(
            modular_pipeline(
                pipe=create_base_pipeline(inference_tag=True, **kwargs),
                inputs={"X_n_y": "test"},
                # namespace="train"
            )
            + modular_pipeline(
                pipe=test_model_pipeline,
                inputs={
                    "test": "hf_dataset",  # or: train.hf_dataset
                    # 'test':'test.hf_dataset',
                },
            )
        ),
        inputs={
            "finetuned_pre_trained_isco_classifier": "train.finetuned_pre_trained_isco_classifier",
            "test": "test",
            "isco_names": "isco_names",
        },
        namespace="test",
    )
Copy code
train_test_pipeline = create_train_test_pipeline(**kwargs)

train_model_pipeline = create_train_model_pipeline(**kwargs)
test_model_pipeline = create_predict_pipeline(**kwargs)
Copy code
train_pipeline = modular_pipeline(
        pipe=(
            modular_pipeline(
                pipe=create_base_pipeline(**kwargs),
                inputs={"X_n_y": "train"},
                # namespace="train"
            )
            + modular_pipeline(
                pipe=train_model_pipeline,
                inputs={
                    "train": "hf_dataset",  # or: train.hf_dataset
 
                },
            )
        ),
        inputs={
            "train": "train",
            "test": "test",
            "isco_names": "isco_names",
        },
        namespace="train",
    )
Copy code
def create_train_test_pipeline(**kwargs) -> Pipeline:
    return pipeline(
        [
            node(
                func=isco_balance_train_test_split,
                inputs={
                    "df": "filtered_n_validated_data",
                    "sample_size_frac": "params:sample_size_frac",
                    **kwargs,
                },
                outputs=["train", "test"],
                name="isco_balanced_train_test_split",
            ),
        ]
    )  # type: ignore
those are all the relevant entries
n
Copy code
inputs={
            "finetuned_pre_trained_isco_classifier": "train.finetuned_pre_trained_isco_classifier",
            "test": "test",
            "isco_names": "isco_names",
        },
So you did define
test
as a dataset?
h
no
test is just the output of the test_train_split
n
This is very hard to understand. Right now from your screenshot you have, which one is the unexpected one? • finnish.train (dataset) • finnish.test (dataset) • finnish.train (a modular pipeline) • finnish.test (a modular pipeline) • train • test
and maybe running
kedro catalog list
will help to debug
h
train and test are the unexpected ones
expected: • finnish.train (dataset) • finnish.test (dataset) • finnish.train (a modular pipeline) • finnish.test (a modular pipeline)
test <-> finnish.test (dataset)
train <-> finnish.train
somehow these are connected in the gui
so kedro makes them out to be the same dataset
and i cant find where they are coming from
these un-namespaced test and train datasets
when i unfold the viz, they are shown to correspond to the namespaced finnish.train datasets
n
I am not really sure. But maybe that’s because you using the same namespace as a pipeline and a dataset name
can you try using something else as the dataset name other than
finnish.train
and
finnish.test
?
h
DataSets in ‘__default__’ pipeline: Datasets mentioned in pipeline: DefaultDataSet: - finnish.test.y - finnish.train.finetuned_pre_trained_isco_classifier - finnish.test.hf_dataset - finnish.test.isco_classification_metrics - finnish.isco_names - params:finnish.train.finetuner.pretrained_model_name - finnish.train.tokenized_X - finnish.test.tokenized_X - finnish.train - params:finnish.train.finetuner.training_args - finnish.train.X - finnish.test - params:finnish.test.finetuner.pretrained_model_name - finnish.test.predicted_isco - finnish.train.hf_dataset - finnish.train.preprocessed_X - finnish.train.y - finnish.test.X - finnish.test.mapped_dataset - finnish.train.mapped_dataset - finnish.test.preprocessed_X - finnish.train.pre_trained_transformer - params:finnish.sample_size_frac SQLQueryDataSet: - finnish_jobs
so from the catalog list, one deduces that the train and test datasets are truly just artefacts
in kedro viz
at least, they dont show up here
n
can you try using something else as the dataset name other than
finnish.train
and
finnish.test
?
h
okay
its still there
image.png
and when i unfold:
same thing
any idea what the issue could be? maybe a bug in kedro-viz?
n
@Hugo Evers possible @Rashida Kanchwala mentioned she may know something about this and will have a look at this
h
great! if i can be of any help in providing reproducible examples let me know!
😂 1
r
@Hugo Evers -- this is a known issue in Kedro-viz and it's in our backlog to fix it. We will try to prioritize this - https://github.com/kedro-org/kedro-viz/issues/1123
@Tynan FYI
n
Thanks, I have linked this thread to the issue
n
Thanks @Hugo Evers @Nok Lam Chan and @Rashida Kanchwala