we are discussing in the team kedro integration an...
# questions
s
we are discussing in the team kedro integration and several questions rose, would appreciate any guidance :) • given we have a flow but only what to run the parts where the configuration/params changes is it possible? to avoid running all processes. i know i can run select pipelines, but thats not what i’m looking for. • mlflow w&b integration? ◦ i see this package , is it the way to go or any other native way? • data versioning/ model versioning, is it by using mlflow or any other option exists? ◦ how mature is the experiment tracking in kedro and what is planned for the future?
y
Let me see if I can try help with some of your questions: 1. I'll need some clarity here, is it that you want to Kedro to figure out what's changed about your pipelines and only run the things that changed and the downstream pipelines? 2. Let's talk about MLflow and W&B separately a. There are a few ways that we've seen people using MLflow with Kedro. i. Use of
kedro-mlflow
(most common) - it's a plugin developed and supported by an awesome team - @Yolan Honoré-Rougé and @Takieddine Kadiri are here sometimes. ii. Use of your own hooks, some people develop their own hooks with MLflow. b. We did spot this old blogpost which integrated W&B with Kedro. 3. I'll talk about data versioning and model versioning separately too a. Data versioning i. Kedro supports basic functionality for this, using the
versioned: true
field on an entry in
catalog.yml
means that Kedro will "snapshot" a version of that dataset. Datasets in Kedro can also be models or images too, anything can be saved. ii. There was talk about creating a DVC plugin but we've never seen this materialised. b. Model versioning i. Most people use MLflow but there's a growing number of Kedro-Viz users (34%) that use Experiment Tracking in Kedro ii. We're working on a way for users to collaborate and share experiments using cloud storage and have a few usability improvements to make e.g. a way to delete runs
s
thanks for the response! so about the first point, yes, if i changed a parameter for example that is applicable for a certain block than running the whole pipeline will only run this block, and downstream dependant
h
You can use to_output arguments to run Kedro, it would only run the necessary nodes to get your specified output