Hi everyone Is it possible to combine multiple Kedro project Kedro #questions

Hi everyone. Is it possible to combine multiple Ke...

Rennan Haro

08/15/2023, 2:05 PM

Hi everyone. Is it possible to combine multiple Kedro projects (2 or 3) into a single

viz

? Curious to hear some ideas 🙂 The context is that we have a couple of Kedro projects in the client, but having a single

viz

graph to visualize all of them would help us track upstream/downstream dependencies.

☝️ 1

Rennan Haro

08/15/2023, 2:06 PM

Giving a bit more context into the projects: They are separate Kedro projects, that share one single

catalog

. One project is exclusively for data ingestion/preprocessing — getting data from multiple

sources

and storing them in the

silver

layer. The other projects are downstream consumers, meaning that they get data exclusively from

silver

datajoely

08/15/2023, 2:06 PM

how are you achieving a shared catalog - are they sharing the same filesystem?

Rennan Haro

08/15/2023, 2:08 PM

Yes, everything in

s3

(and a couple of

SQLQueryDataSets

as sources to the ingestion project)

datajoely

08/15/2023, 2:11 PM

I’m still a little confused how they share things - are the data files are on S3 or the actual YAML code?

Nok Lam Chan

08/15/2023, 2:13 PM

Are they technically can be one Kedro project? If that’s the case it maybe easy to create a “meta project” or using one as base and importing pipelines from the other.

Copy code

from xxx import register_pipelines as register_x
from yyy import register_pipelines as register_y

pipelines = {}
pipelines["x"] = register_x()
pipelines=["y"] = register_y()
pipelines["__default__"] = sum(pipelines)

Additionally, it may requires installing both project or adding both project into

PYTHONPATH

Nok Lam Chan

08/15/2023, 2:14 PM

Hm - just thinking more, catalog in this case won’t be shared as you probably have the entry defined separately?

datajoely

08/15/2023, 2:15 PM

whilst we’re getting to the bottom of this - the quick and dirty way to do this may be to fake it. I.e.

kedro viz --save-file

twice, get the json, modify it and then

kedro viz --load-file

Rennan Haro

08/15/2023, 2:16 PM

Working with the json exports was the first thing that came to mind

Rennan Haro

08/15/2023, 2:18 PM

We’re centralizing the catalog in our ingestion project, and the other projects are referencing that catalog location (it’s a monorepo with all projects)

Rennan Haro

08/15/2023, 2:22 PM

The idea of separating into different projects rather than pipelines sparked from the fact that the dependencies for ingestion, preprocessing and training are very different, and the downstream projects (pipelines) are also very different from one another

datajoely

08/15/2023, 2:22 PM

yes make sense

datajoely

08/15/2023, 2:22 PM

we’re working on something that will make this much easier

datajoely

08/15/2023, 2:22 PM

but I can’t think of a neat way to do it today

Nok Lam Chan

08/15/2023, 2:27 PM

Working with the json exports was the first thing that came to mind

In that case a meta project may work but that requires you having all the dependencies installed. Let us know if you have success with concating the JSON.

Rennan Haro

08/15/2023, 2:36 PM

Yep! Thanks a lot! I’ll update once we find a solution 🙂

85 Views

Open in Slack

Previous Next