Is there a standard way to do selective execution of pipelin Kedro #questions

Is there a standard way to do selective execution ...

Opher Donchin

12/30/2024, 10:26 PM

Is there a standard way to do selective execution of pipelines that depend on whether the output is up to date? Something like the functionality that is the main idea in make?

Hall

12/30/2024, 10:27 PM

Someone will reply to you shortly. In the meantime, this might help:

Deepyaman Datta

12/30/2024, 10:56 PM

IMO this falls in the realm of orchestration, and Kedro can be integrated with the orchestrator of you choice.

Paul Weiss

01/01/2025, 9:15 AM

Is there an orchestrator that Kedro can integrate with that allows to run only those nodes needed to compute the x out of y outputs (or output partitions) that are missing or not up-to-date?

Deepyaman Datta

01/01/2025, 4:02 PM

To the best of my understanding, all of the existing Kedro orchestrator integrations are quite rudimentary (i.e. they mostly focus on getting the DAG translation done), and they don't have anything built specifically for this. The Kedro team is looking at how to improve the experience of deploying to orchestrators and making some best-in-class integrations, but this revamp is in an early stage. Therefore, I would start by looking at which orchestrators support this. I know Dagster supports this (see https://docs.dagster.io/concepts/partitions-schedules-sensors/backfills), and there has been some recent community work on a Kedro and Dagster integration that I'm excited about. Airflow also seems to have some mechanism like this (see https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dag-run.html#catchup), and I'm sure some other orchestrators do, too. I honestly don't know anything about the ergonomics of these. Disclaimer: I currently work for Dagster.

Paul Weiss

01/02/2025, 6:57 AM

Thanks a lot! I'll have a look.

Open in Slack

Previous Next