we are discussing in the team kedro integration and several Kedro #questions

we are discussing in the team kedro integration an...

Sergei Benkovich

03/28/2023, 8:29 AM

we are discussing in the team kedro integration and several questions rose, would appreciate any guidance :) • given we have a flow but only what to run the parts where the configuration/params changes is it possible? to avoid running all processes. i know i can run select pipelines, but thats not what i’m looking for. • mlflow w&b integration? ◦ i see this package , is it the way to go or any other native way? • data versioning/ model versioning, is it by using mlflow or any other option exists? ◦ how mature is the experiment tracking in kedro and what is planned for the future?

Yetunde

03/28/2023, 8:57 AM

Let me see if I can try help with some of your questions: 1. I'll need some clarity here, is it that you want to Kedro to figure out what's changed about your pipelines and only run the things that changed and the downstream pipelines? 2. Let's talk about MLflow and W&B separately a. There are a few ways that we've seen people using MLflow with Kedro. i. Use of

kedro-mlflow

(most common) - it's a plugin developed and supported by an awesome team - @Yolan Honoré-Rougé and @Takieddine Kadiri are here sometimes. ii. Use of your own hooks, some people develop their own hooks with MLflow. b. We did spot this old blogpost which integrated W&B with Kedro. 3. I'll talk about data versioning and model versioning separately too a. Data versioning i. Kedro supports basic functionality for this, using the

versioned: true

field on an entry in

catalog.yml

means that Kedro will "snapshot" a version of that dataset. Datasets in Kedro can also be models or images too, anything can be saved. ii. There was talk about creating a DVC plugin but we've never seen this materialised. b. Model versioning i. Most people use MLflow but there's a growing number of Kedro-Viz users (34%) that use Experiment Tracking in Kedro ii. We're working on a way for users to collaborate and share experiments using cloud storage and have a few usability improvements to make e.g. a way to delete runs

Sergei Benkovich

03/28/2023, 10:47 AM

thanks for the response! so about the first point, yes, if i changed a parameter for example that is applicable for a certain block than running the whole pipeline will only run this block, and downstream dependant

Harsh Maheshwari

04/02/2023, 6:39 AM

You can use to_output arguments to run Kedro, it would only run the necessary nodes to get your specified output

40 Views

Open in Slack

Previous Next