Is there any guidance on transitioning an existing project t Kedro #questions

Is there any guidance on transitioning an existing...

Jon Cohen

08/08/2023, 2:08 PM

Is there any guidance on transitioning an existing project to Kedro? I have a data science project which is currently relatively unstructured. I'm attempting to transition to Kedro by creating a directory in the project for the Kedro project to live in and then move one step at a time from the pipeline over

Sajid Alam

08/08/2023, 2:18 PM

We have plans to add guides to transitioning to Kedro from existing projects. But if you need help with anything in the mean time please don't hesitate to reach out.

datajoely

08/08/2023, 2:34 PM

I think the best strategy today isto generate an empty Kedro project and transition one pipeline at a time

👍 1

Jon Cohen

08/08/2023, 2:38 PM

An issue with that approach is that this project is mostly one pipeline. It's more scientist code than engineer code and so I wanted to avoid having to transition the whole pipeline all at once. Ideally I wanted to create one node at a time in the same repo so that the old and new pipelines can share data

datajoely

08/08/2023, 2:39 PM

So a kedro pipeline can be any granularity you want

datajoely

08/08/2023, 2:39 PM

so if you want to merge the data processing park at the beginning you run

kedro pipeline create data_processing

and start working there

Jon Cohen

08/08/2023, 2:42 PM

And then split them out into individual nodes?

datajoely

08/08/2023, 2:42 PM

well just the data processing part in this example

Jon Cohen

08/08/2023, 2:43 PM

So then IIUC

Jon Cohen

08/08/2023, 2:47 PM

Right now there is a bunch of manual process involved in running this pipeline. We have a folder of Python scripts which ingest the previous step of data output and then call into the next step of data processing, and these need to be called by hand. This project is in the middle of transitioning from client number 1 to clients number 2 and 3, so this is quickly not going to work as we need to start generalizing the data processing pipeline to different client data. One of our big goals in transitioning to Kedro is to be able to get rid of this manual process and have more control over the pipeline. So you're suggesting that we make each script its own Kedro pipeline and move them over one by one with their code dependencies?

datajoely

08/08/2023, 2:50 PM

I think that makes sense

datajoely

08/08/2023, 2:50 PM

but my push is move the smallest most self contained part first

datajoely

08/08/2023, 2:50 PM

and see if it works

Jon Cohen

08/08/2023, 2:50 PM

That's the goal

Jon Cohen

08/08/2023, 2:50 PM

but I think that means the Kedro project needs to be in the same repo

datajoely

08/08/2023, 2:50 PM

Whilst a bit ugly, that shouldn’t be an issue

👍 1

Jon Cohen

08/08/2023, 2:51 PM

What would the advantage be of making each script its own pipeline versus a node in a growing single pipeline?

datajoely

08/08/2023, 2:53 PM

So I think in general that’s the right way to write Kedro anyway, a

pipeline

can be a very small unit of logic, if you look at the demo viz on https://demo.kedro.org/ you can see each of those ‘mega nodes’ is a modular pipeline with its own namesapce i.e. Ingestion, Feature Engineering

datajoely

08/08/2023, 2:54 PM

this was adapted from a demo project I made aggggeeeessss ago https://github.com/datajoely/modular-spaceflights

Jon Cohen

08/08/2023, 2:56 PM

How angrily were you using Kedro?

datajoely

08/08/2023, 2:56 PM

What do you mean in this context? I guess I was trying to show a representative way a practitioner would use it after using it in the real world for a couple of years

Jon Cohen

08/08/2023, 2:58 PM

Just joking. It says "This project is designed to be a realistic example of what Kedro looks like when used in anger."

💢 1

Jon Cohen

08/08/2023, 2:58 PM

This all makes sense

Jon Cohen

08/08/2023, 2:58 PM

Thank you for the guidance

datajoely

08/08/2023, 2:58 PM

oh haha

datajoely

08/08/2023, 2:58 PM

again a long time since I wrote that

datajoely

08/08/2023, 2:58 PM

let’s call it a 6/10

Jon Cohen

08/08/2023, 2:58 PM

lol

Jon Cohen

08/08/2023, 2:59 PM

Well thank you. I'll come back when I have more questions. One of the reasons we settled on Kedro was the level of support on this channel, so your effort is appreciated

❤️ 1

👍 1

🚀 3

Nok Lam Chan

08/09/2023, 8:07 PM

Hey! I am very interested in this topic but I am off today. I would get back to this tomorrow, meanwhile if you have questions, please fire it 🚀

23 Views

Open in Slack

Previous Next