https://kedro.org/ logo
#questions
Title
# questions
j

Jon Cohen

08/08/2023, 2:08 PM
Is there any guidance on transitioning an existing project to Kedro? I have a data science project which is currently relatively unstructured. I'm attempting to transition to Kedro by creating a directory in the project for the Kedro project to live in and then move one step at a time from the pipeline over
s

Sajid Alam

08/08/2023, 2:18 PM
We have plans to add guides to transitioning to Kedro from existing projects. But if you need help with anything in the mean time please don't hesitate to reach out.
d

datajoely

08/08/2023, 2:34 PM
I think the best strategy today isto generate an empty Kedro project and transition one pipeline at a time
👍 1
j

Jon Cohen

08/08/2023, 2:38 PM
An issue with that approach is that this project is mostly one pipeline. It's more scientist code than engineer code and so I wanted to avoid having to transition the whole pipeline all at once. Ideally I wanted to create one node at a time in the same repo so that the old and new pipelines can share data
d

datajoely

08/08/2023, 2:39 PM
So a kedro pipeline can be any granularity you want
so if you want to merge the data processing park at the beginning you run
kedro pipeline create data_processing
and start working there
j

Jon Cohen

08/08/2023, 2:42 PM
And then split them out into individual nodes?
d

datajoely

08/08/2023, 2:42 PM
well just the data processing part in this example
j

Jon Cohen

08/08/2023, 2:43 PM
So then IIUC
Right now there is a bunch of manual process involved in running this pipeline. We have a folder of Python scripts which ingest the previous step of data output and then call into the next step of data processing, and these need to be called by hand. This project is in the middle of transitioning from client number 1 to clients number 2 and 3, so this is quickly not going to work as we need to start generalizing the data processing pipeline to different client data. One of our big goals in transitioning to Kedro is to be able to get rid of this manual process and have more control over the pipeline. So you're suggesting that we make each script its own Kedro pipeline and move them over one by one with their code dependencies?
d

datajoely

08/08/2023, 2:50 PM
I think that makes sense
but my push is move the smallest most self contained part first
and see if it works
j

Jon Cohen

08/08/2023, 2:50 PM
That's the goal
but I think that means the Kedro project needs to be in the same repo
d

datajoely

08/08/2023, 2:50 PM
Whilst a bit ugly, that shouldn’t be an issue
👍 1
j

Jon Cohen

08/08/2023, 2:51 PM
What would the advantage be of making each script its own pipeline versus a node in a growing single pipeline?
d

datajoely

08/08/2023, 2:53 PM
So I think in general that’s the right way to write Kedro anyway, a
pipeline
can be a very small unit of logic, if you look at the demo viz on https://demo.kedro.org/ you can see each of those ‘mega nodes’ is a modular pipeline with its own namesapce i.e. Ingestion, Feature Engineering
this was adapted from a demo project I made aggggeeeessss ago https://github.com/datajoely/modular-spaceflights
j

Jon Cohen

08/08/2023, 2:56 PM
How angrily were you using Kedro?
d

datajoely

08/08/2023, 2:56 PM
What do you mean in this context? I guess I was trying to show a representative way a practitioner would use it after using it in the real world for a couple of years
j

Jon Cohen

08/08/2023, 2:58 PM
Just joking. It says "This project is designed to be a realistic example of what Kedro looks like when used in anger."
💢 1
This all makes sense
Thank you for the guidance
d

datajoely

08/08/2023, 2:58 PM
oh haha
again a long time since I wrote that
let’s call it a 6/10
j

Jon Cohen

08/08/2023, 2:58 PM
lol
Well thank you. I'll come back when I have more questions. One of the reasons we settled on Kedro was the level of support on this channel, so your effort is appreciated
❤️ 1
👍 1
🚀 3
n

Nok Lam Chan

08/09/2023, 8:07 PM
Hey! I am very interested in this topic but I am off today. I would get back to this tomorrow, meanwhile if you have questions, please fire it 🚀