Hi all I get this weird bug with kedro viz i have this set o Kedro #questions

Hi all, I get this weird bug with kedro viz: i h...

Hugo Evers

07/03/2023, 3:46 PM

Hi all, I get this weird bug with kedro viz: i have this set of nested modular_pipelines in order to make my training/test and finetuning pipelines completely dry for different languages. to that end, i mapped the train and test splits throughout my pipelines to make i namespace them at the last moment. But when i visualize i see these two unconnected artifacts, Test and Train:

Hugo Evers

07/03/2023, 3:48 PM

Which is weird, because when i open the nested pipelines, these disappear, and can only be found in this state by refreshing the browser:

Hugo Evers

07/03/2023, 3:48 PM

but you can see that the train and test datasets are represented twice in visualisation

Hugo Evers

07/03/2023, 3:49 PM

so if i hide it for example:

Hugo Evers

07/03/2023, 3:49 PM

i am decently sure this is not an issue with my code, but is very difficult to debug

Nok Lam Chan

07/03/2023, 3:50 PM

Are you using namespace?

Hugo Evers

07/03/2023, 3:50 PM

yes

Hugo Evers

07/03/2023, 3:50 PM

not everywhere though

Nok Lam Chan

07/03/2023, 3:50 PM

viz used the

notation to detect if something is a namespace pipeline

Nok Lam Chan

07/03/2023, 3:50 PM

did you use it for something which doesn’t belong to a namespace?

Hugo Evers

07/03/2023, 3:50 PM

Hugo Evers

07/03/2023, 3:51 PM

so basically, train and test are namespaced

Hugo Evers

07/03/2023, 3:51 PM

and then i namespace per language

Hugo Evers

07/03/2023, 3:51 PM

e.g finnish

Nok Lam Chan

07/03/2023, 3:51 PM

can you share your catalog?

Nok Lam Chan

07/03/2023, 3:51 PM

delete

Hugo Evers

07/03/2023, 3:51 PM

so train and test are both memorydatasets

Hugo Evers

07/03/2023, 3:52 PM

not defined in the catalog (because thats pretty difficult with namespacing atm)

Nok Lam Chan

07/03/2023, 3:52 PM

how is it defined in your pipeline?

Hugo Evers

07/03/2023, 3:52 PM

Copy code

full_pipeline = train_test_pipeline + train_pipeline + evaluation_pipeline

    finnish_pipeline = modular_pipeline(
        pipe=full_pipeline,
        inputs={"filtered_n_validated_data": "finnish_jobs"},
        outputs={},
        namespace="finnish",
    )

Hugo Evers

07/03/2023, 3:52 PM

this full_pipeline is going to be re-used several times

Hugo Evers

07/03/2023, 3:52 PM

within different namespaces

Nok Lam Chan

07/03/2023, 3:53 PM

so train and test are both memorydatasets

Can you show the pipeline for this?

Hugo Evers

07/03/2023, 3:53 PM

Copy code

evaluation_pipeline = modular_pipeline(
        pipe=(
            modular_pipeline(
                pipe=create_base_pipeline(inference_tag=True, **kwargs),
                inputs={"X_n_y": "test"},
                # namespace="train"
            )
            + modular_pipeline(
                pipe=test_model_pipeline,
                inputs={
                    "test": "hf_dataset",  # or: train.hf_dataset
                    # 'test':'test.hf_dataset',
                },
            )
        ),
        inputs={
            "finetuned_pre_trained_isco_classifier": "train.finetuned_pre_trained_isco_classifier",
            "test": "test",
            "isco_names": "isco_names",
        },
        namespace="test",
    )

Hugo Evers

07/03/2023, 3:55 PM

Copy code

train_test_pipeline = create_train_test_pipeline(**kwargs)

train_model_pipeline = create_train_model_pipeline(**kwargs)
test_model_pipeline = create_predict_pipeline(**kwargs)

Hugo Evers

07/03/2023, 3:55 PM

Copy code

train_pipeline = modular_pipeline(
        pipe=(
            modular_pipeline(
                pipe=create_base_pipeline(**kwargs),
                inputs={"X_n_y": "train"},
                # namespace="train"
            )
            + modular_pipeline(
                pipe=train_model_pipeline,
                inputs={
                    "train": "hf_dataset",  # or: train.hf_dataset
 
                },
            )
        ),
        inputs={
            "train": "train",
            "test": "test",
            "isco_names": "isco_names",
        },
        namespace="train",
    )

Hugo Evers

07/03/2023, 3:57 PM

Copy code

def create_train_test_pipeline(**kwargs) -> Pipeline:
    return pipeline(
        [
            node(
                func=isco_balance_train_test_split,
                inputs={
                    "df": "filtered_n_validated_data",
                    "sample_size_frac": "params:sample_size_frac",
                    **kwargs,
                },
                outputs=["train", "test"],
                name="isco_balanced_train_test_split",
            ),
        ]
    )  # type: ignore

Hugo Evers

07/03/2023, 3:58 PM

those are all the relevant entries

Nok Lam Chan

07/03/2023, 4:01 PM

Copy code

inputs={
            "finetuned_pre_trained_isco_classifier": "train.finetuned_pre_trained_isco_classifier",
            "test": "test",
            "isco_names": "isco_names",
        },

So you did define

test

as a dataset?

Hugo Evers

07/03/2023, 4:03 PM

Hugo Evers

07/03/2023, 4:03 PM

test is just the output of the test_train_split

Nok Lam Chan

07/03/2023, 4:04 PM

This is very hard to understand. Right now from your screenshot you have, which one is the unexpected one? • finnish.train (dataset) • finnish.test (dataset) • finnish.train (a modular pipeline) • finnish.test (a modular pipeline) • train • test

Nok Lam Chan

07/03/2023, 4:04 PM

and maybe running

kedro catalog list

will help to debug

Hugo Evers

07/03/2023, 4:05 PM

train and test are the unexpected ones

Hugo Evers

07/03/2023, 4:05 PM

expected: • finnish.train (dataset) • finnish.test (dataset) • finnish.train (a modular pipeline) • finnish.test (a modular pipeline)

Hugo Evers

07/03/2023, 4:05 PM

test <-> finnish.test (dataset)

Hugo Evers

07/03/2023, 4:06 PM

train <-> finnish.train

Hugo Evers

07/03/2023, 4:06 PM

somehow these are connected in the gui

Hugo Evers

07/03/2023, 4:06 PM

so kedro makes them out to be the same dataset

Hugo Evers

07/03/2023, 4:06 PM

and i cant find where they are coming from

Hugo Evers

07/03/2023, 4:06 PM

these un-namespaced test and train datasets

Hugo Evers

07/03/2023, 4:07 PM

when i unfold the viz, they are shown to correspond to the namespaced finnish.train datasets

Nok Lam Chan

07/03/2023, 4:07 PM

I am not really sure. But maybe that’s because you using the same namespace as a pipeline and a dataset name

Nok Lam Chan

07/03/2023, 4:08 PM

can you try using something else as the dataset name other than

finnish.train

and

finnish.test

Hugo Evers

07/03/2023, 4:09 PM

DataSets in ‘__default__’ pipeline: Datasets mentioned in pipeline: DefaultDataSet: - finnish.test.y - finnish.train.finetuned_pre_trained_isco_classifier - finnish.test.hf_dataset - finnish.test.isco_classification_metrics - finnish.isco_names - params:finnish.train.finetuner.pretrained_model_name - finnish.train.tokenized_X - finnish.test.tokenized_X - finnish.train - params:finnish.train.finetuner.training_args - finnish.train.X - finnish.test - params:finnish.test.finetuner.pretrained_model_name - finnish.test.predicted_isco - finnish.train.hf_dataset - finnish.train.preprocessed_X - finnish.train.y - finnish.test.X - finnish.test.mapped_dataset - finnish.train.mapped_dataset - finnish.test.preprocessed_X - finnish.train.pre_trained_transformer - params:finnish.sample_size_frac SQLQueryDataSet: - finnish_jobs

Hugo Evers

07/03/2023, 4:09 PM

so from the catalog list, one deduces that the train and test datasets are truly just artefacts

Hugo Evers

07/03/2023, 4:09 PM

in kedro viz

Hugo Evers

07/03/2023, 4:10 PM

at least, they dont show up here

Nok Lam Chan

07/03/2023, 4:10 PM

can you try using something else as the dataset name other than

finnish.train

and

finnish.test

Hugo Evers

07/03/2023, 4:10 PM

okay

Hugo Evers

07/03/2023, 4:15 PM

its still there

Hugo Evers

07/03/2023, 4:15 PM

Hugo Evers

07/03/2023, 4:16 PM

and when i unfold:

Hugo Evers

07/03/2023, 4:16 PM

same thing

Hugo Evers

07/04/2023, 7:36 AM

any idea what the issue could be? maybe a bug in kedro-viz?

Nok Lam Chan

07/04/2023, 8:38 AM

@Hugo Evers possible @Rashida Kanchwala mentioned she may know something about this and will have a look at this

Hugo Evers

07/04/2023, 8:53 AM

great! if i can be of any help in providing reproducible examples let me know!

😂 1

Rashida Kanchwala

07/04/2023, 10:19 AM

@Hugo Evers -- this is a known issue in Kedro-viz and it's in our backlog to fix it. We will try to prioritize this - https://github.com/kedro-org/kedro-viz/issues/1123

Rashida Kanchwala

07/04/2023, 10:21 AM

@Tynan FYI

Nok Lam Chan

07/04/2023, 10:26 AM

Thanks, I have linked this thread to the issue

Nero Okwa

07/06/2023, 9:25 AM

Thanks @Hugo Evers @Nok Lam Chan and @Rashida Kanchwala

13 Views

Open in Slack

Previous Next