Ofir03/05/2023, 10:46 PM
Deepyaman Datta03/06/2023, 5:53 AM
Ofir03/06/2023, 2:55 PM
From one of the tasks in our Prefect workflow
Deepyaman Datta03/06/2023, 3:21 PM
without having to containerize anything. Basically, it should be possible to automatically convert Kedro node -> Prefect task and Kedro pipeline -> Prefect flow (or maybe even modular pipeline -> flow and overall pipeline to flow of flows). I think mapping dataset behavior is a bit more challenging, since Prefect has a different method of handling task results (can you control where to write each result from a task? I'm not sure...). I can try to write up a Kedro issue on this (or feel free to post one to our GitHub), as I've also been keen on exploring Prefect 2 deployment for a while (I personally think it would be very cool to have a PrefectRunner for Kedro out of the box as a default deployment option), but I can't promise any timeline. Let me see if anybody else on the team has heard much in the way of general Prefect deployment demand. 🙂
Ofir03/06/2023, 3:22 PM
build a deployment package of our Kedro pipeline onto a Docker image?
kedro docker build
Deepyaman Datta03/06/2023, 8:43 PM
@Deepyaman Datta what are the most common deployment scenarios you experience with Kedro’s customers? Dask? Airflow?I think @Yetunde @Ivan Danov or @Merel can probably answer this with more data, but I think (from what I've seen/recall): • Databricks is very common, although that is probably skewed due to high use of Spark and Databricks amongst QuantumBlack/McKinsey projects. • Other cloud-based workflow orchestrators are often seen, such as Vertex AI (managed Kubeflow Pipelines) and Azure ML pipelines. GetInData (company unaffiliated with QuantumBlack/McKinsey) has developed a number of plugins in this space, and we see fair number of people looking to use such things in #plugins-integrations (anecdotally). • I think Airflow is next most common, but I could be wrong on this (could be above some of the cloud-based orchestrators). • Also seen Sagemaker, Argo Workflows, etc., or even more closed platforms like Dataiku. • Dask is quite uncommon (😢).
Can theIt can.build a deployment package of our Kedro pipeline onto a Docker image?
kedro docker build
is quite simple, in that it will package everything into a single Docker image, and then you can run whatever pipelines you want. If you're happy with this, it's a fine option. If you want each node to map to a task in an orchestrator, it's not as great for that approach.
Ofir03/07/2023, 1:12 PM
Zhe An03/16/2023, 4:44 PM
Deepyaman Datta03/16/2023, 5:24 PM
Ofir05/05/2023, 10:05 AM
from a Prefect workflow? What are the disadvantages of doing such thing?
Deepyaman Datta05/06/2023, 9:11 PM
is using Python to call a CLI endpoint that runs Python functions. • At the very minimum, I don't think using
is great for capturing potential errors, etc. But this can be remedied by using
module methods (which are used heavily in Kedro actually), which you can check output/logs/error codes on. • Better yet, if you do want to essentially call the
method, you could do something like:
(copied the boilerplate from https://docs.kedro.org/en/stable/integrations/databricks_workspace.html#running-kedro-project-from-a-databricks-notebook actually) • Finally, if you want more fine-grained control and/or are using bits and pieces of Kedro with more Prefect, you can construct runner + pipelines independently, configure them, and run. Alternatively (and ideally) a plugin/hook handles most of the dirty work here. The main reason I think some of the existing deployment guides do the shelled execution is because a lot of workflow orchestrators expect a shell script of sorts as a task, or Python tasks are often newer features. I think, where orchestrators support Python tasks, may as well take advantage of that.
from kedro.framework.session import KedroSession from kedro.framework.startup import bootstrap_project bootstrap_project(project_root) with KedroSession.create(project_path=project_root, env="prefect") as session: session.run()