Hi everyone. Is it possible to combine multiple Ke...
# questions
r
Hi everyone. Is it possible to combine multiple Kedro projects (2 or 3) into a single
viz
? Curious to hear some ideas šŸ™‚ The context is that we have a couple of Kedro projects in the client, but having a single
viz
graph to visualize all of them would help us track upstream/downstream dependencies.
ā˜ļø 1
Giving a bit more context into the projects: They are separate Kedro projects, that share one single
catalog
. One project is exclusively for data ingestion/preprocessing ā€” getting data from multiple
sources
and storing them in the
silver
layer. The other projects are downstream consumers, meaning that they get data exclusively from
silver
d
how are you achieving a shared catalog - are they sharing the same filesystem?
r
Yes, everything in
s3
(and a couple of
SQLQueryDataSets
as sources to the ingestion project)
d
Iā€™m still a little confused how they share things - are the data files are on S3 or the actual YAML code?
n
Are they technically can be one Kedro project? If thatā€™s the case it maybe easy to create a ā€œmeta projectā€ or using one as base and importing pipelines from the other.
Copy code
from xxx import register_pipelines as register_x
from yyy import register_pipelines as register_y

pipelines = {}
pipelines["x"] = register_x()
pipelines=["y"] = register_y()
pipelines["__default__"] = sum(pipelines)
Additionally, it may requires installing both project or adding both project into
PYTHONPATH
Hm - just thinking more, catalog in this case wonā€™t be shared as you probably have the entry defined separately?
d
whilst weā€™re getting to the bottom of this - the quick and dirty way to do this may be to fake it. I.e.
kedro viz --save-file
twice, get the json, modify it and then
kedro viz --load-file
r
Working with the json exports was the first thing that came to mind
Weā€™re centralizing the catalog in our ingestion project, and the other projects are referencing that catalog location (itā€™s a monorepo with all projects)
The idea of separating into different projects rather than pipelines sparked from the fact that the dependencies for ingestion, preprocessing and training are very different, and the downstream projects (pipelines) are also very different from one another
d
yes make sense
weā€™re working on something that will make this much easier
but I canā€™t think of a neat way to do it today
n
Working with the json exports was the first thing that came to mind
In that case a meta project may work but that requires you having all the dependencies installed. Let us know if you have success with concating the JSON.
r
Yep! Thanks a lot! Iā€™ll update once we find a solution šŸ™‚