Hi everyone. Is it possible to combine multiple Ke...
# questions
Hi everyone. Is it possible to combine multiple Kedro projects (2 or 3) into a single
? Curious to hear some ideas 🙂 The context is that we have a couple of Kedro projects in the client, but having a single
graph to visualize all of them would help us track upstream/downstream dependencies.
☝️ 1
Giving a bit more context into the projects: They are separate Kedro projects, that share one single
. One project is exclusively for data ingestion/preprocessing — getting data from multiple
and storing them in the
layer. The other projects are downstream consumers, meaning that they get data exclusively from
how are you achieving a shared catalog - are they sharing the same filesystem?
Yes, everything in
(and a couple of
as sources to the ingestion project)
I’m still a little confused how they share things - are the data files are on S3 or the actual YAML code?
Are they technically can be one Kedro project? If that’s the case it maybe easy to create a “meta project” or using one as base and importing pipelines from the other.
Copy code
from xxx import register_pipelines as register_x
from yyy import register_pipelines as register_y

pipelines = {}
pipelines["x"] = register_x()
pipelines=["y"] = register_y()
pipelines["__default__"] = sum(pipelines)
Additionally, it may requires installing both project or adding both project into
Hm - just thinking more, catalog in this case won’t be shared as you probably have the entry defined separately?
whilst we’re getting to the bottom of this - the quick and dirty way to do this may be to fake it. I.e.
kedro viz --save-file
twice, get the json, modify it and then
kedro viz --load-file
Working with the json exports was the first thing that came to mind
We’re centralizing the catalog in our ingestion project, and the other projects are referencing that catalog location (it’s a monorepo with all projects)
The idea of separating into different projects rather than pipelines sparked from the fact that the dependencies for ingestion, preprocessing and training are very different, and the downstream projects (pipelines) are also very different from one another
yes make sense
we’re working on something that will make this much easier
but I can’t think of a neat way to do it today
Working with the json exports was the first thing that came to mind
In that case a meta project may work but that requires you having all the dependencies installed. Let us know if you have success with concating the JSON.
Yep! Thanks a lot! I’ll update once we find a solution 🙂