Is there a standard way to do selective execution ...
# questions
o
Is there a standard way to do selective execution of pipelines that depend on whether the output is up to date? Something like the functionality that is the main idea in make?
h
Someone will reply to you shortly. In the meantime, this might help:
d
IMO this falls in the realm of orchestration, and Kedro can be integrated with the orchestrator of you choice.
p
Is there an orchestrator that Kedro can integrate with that allows to run only those nodes needed to compute the x out of y outputs (or output partitions) that are missing or not up-to-date?
d
To the best of my understanding, all of the existing Kedro orchestrator integrations are quite rudimentary (i.e. they mostly focus on getting the DAG translation done), and they don't have anything built specifically for this. The Kedro team is looking at how to improve the experience of deploying to orchestrators and making some best-in-class integrations, but this revamp is in an early stage. Therefore, I would start by looking at which orchestrators support this. I know Dagster supports this (see https://docs.dagster.io/concepts/partitions-schedules-sensors/backfills), and there has been some recent community work on a Kedro and Dagster integration that I'm excited about. Airflow also seems to have some mechanism like this (see https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dag-run.html#catchup), and I'm sure some other orchestrators do, too. I honestly don't know anything about the ergonomics of these. Disclaimer: I currently work for Dagster.
p
Thanks a lot! I'll have a look.