Hello people wave I have the following setup 2 modular pipel Kedro #questions

Hello people :wave: I have the following setup: - ...

mattia.paterna

06/06/2024, 2:24 PM

Hello people 👋 I have the following setup: • 2 modular pipelines, train and evaluate defined with their own namespace, i.e. train and evaluate • 2 data catalogs, one for each pipeline, where the dataset names look like

"<namespace>.<name>

, e.g.

train.docs

evaluate.docs

, etc. • one global data catalog where the datasets are defined without namespaces—ideally, they are shared across pipelines, e.g.

full-data

. I then compose and run the pipelines, however, I notice that 2 artifacts are created for the dataset defined inside the global data catalog:

train-full-data

and

evaluate-full-data

. I read from the documentation that you can create a dataset factory pattern if you have e.g. the same output for namespaced modular pipelines. What about namespaced modular pipelines that share the same input instead? I would expect a behaviour for which • if datasets have a namespace, they are associated to the pipeline with the corresponding namespace • if datasets do not have a namespace, they are shared across all namespaced modular pipelines that reference them. I hope it makes sense.

Ian Whalen

06/06/2024, 2:43 PM

What about namespaced modular pipelines that share the same input instead?

If I'm understanding correctly, I think you want the

inputs

keyword in the

pipeline

function. (See docs here) So defining:

Copy code

train_pipeline = pipeline(
    base_pipeline, 
    namespace="train", 
    inputs={"full-data"},  # Note this is a set.
)

Then your pipeline will read from

full-data

not

train.full-data

. Does that help?

mattia.paterna

06/06/2024, 3:04 PM

Brilliant! I was aware of the

outputs

keyword for composing disconnected pipelines, but I was not aware of

inputs

. I will give it a try and let you know—thanks! 🙌

👍 1

🥳 1

mattia.paterna

06/11/2024, 2:48 PM

So, it seems to work as you explained and I can see that the artifact is not namespaced. 👌

🥳 1

👍 1

2 Views

Open in Slack

Previous Next