https://kedro.org/ logo
#questions
Title
# questions
r

Rennan Haro

08/15/2023, 2:05 PM
Hi everyone. Is it possible to combine multiple Kedro projects (2 or 3) into a single
viz
? Curious to hear some ideas 🙂 The context is that we have a couple of Kedro projects in the client, but having a single
viz
graph to visualize all of them would help us track upstream/downstream dependencies.
☝️ 1
Giving a bit more context into the projects: They are separate Kedro projects, that share one single
catalog
. One project is exclusively for data ingestion/preprocessing — getting data from multiple
sources
and storing them in the
silver
layer. The other projects are downstream consumers, meaning that they get data exclusively from
silver
d

datajoely

08/15/2023, 2:06 PM
how are you achieving a shared catalog - are they sharing the same filesystem?
r

Rennan Haro

08/15/2023, 2:08 PM
Yes, everything in
s3
(and a couple of
SQLQueryDataSets
as sources to the ingestion project)
d

datajoely

08/15/2023, 2:11 PM
I’m still a little confused how they share things - are the data files are on S3 or the actual YAML code?
n

Nok Lam Chan

08/15/2023, 2:13 PM
Are they technically can be one Kedro project? If that’s the case it maybe easy to create a “meta project” or using one as base and importing pipelines from the other.
Copy code
from xxx import register_pipelines as register_x
from yyy import register_pipelines as register_y

pipelines = {}
pipelines["x"] = register_x()
pipelines=["y"] = register_y()
pipelines["__default__"] = sum(pipelines)
Additionally, it may requires installing both project or adding both project into
PYTHONPATH
Hm - just thinking more, catalog in this case won’t be shared as you probably have the entry defined separately?
d

datajoely

08/15/2023, 2:15 PM
whilst we’re getting to the bottom of this - the quick and dirty way to do this may be to fake it. I.e.
kedro viz --save-file
twice, get the json, modify it and then
kedro viz --load-file
r

Rennan Haro

08/15/2023, 2:16 PM
Working with the json exports was the first thing that came to mind
We’re centralizing the catalog in our ingestion project, and the other projects are referencing that catalog location (it’s a monorepo with all projects)
The idea of separating into different projects rather than pipelines sparked from the fact that the dependencies for ingestion, preprocessing and training are very different, and the downstream projects (pipelines) are also very different from one another
d

datajoely

08/15/2023, 2:22 PM
yes make sense
we’re working on something that will make this much easier
but I can’t think of a neat way to do it today
n

Nok Lam Chan

08/15/2023, 2:27 PM
Working with the json exports was the first thing that came to mind
In that case a meta project may work but that requires you having all the dependencies installed. Let us know if you have success with concating the JSON.
r

Rennan Haro

08/15/2023, 2:36 PM
Yep! Thanks a lot! I’ll update once we find a solution 🙂
21 Views