Hello, I have more of a discussion topic as oppose...
# questions
d
Hello, I have more of a discussion topic as opposed to a question. Sorry if this kind of question isn't allowed! Want to start off by saying I live Kedro, and I want to use it as much as possible. This prompts my question, how is everyone going about integrating Kedro with cloud platforms they work on. It seems many cloud vendors push ready made notebook environments on users and I'm curious what workflows people have for integrating the two. For context, my team is small so we basically handle everything from early cleaning and EDA to finished deployment and monitoring. We currently don't use any cloud vendors and work in a kind of cyclic pattern of using notebooks for some exploration (using Kedro Data Catalog) and then formally writing the cleaning nodes, then back to notebooks to experiment a bit and then back to formalizing nodes etc etc till finished pipeline. For those that are on the cloud is this similar? Do you only work on the cloud? Or do you work partially on the cloud and partially local? Or local but deploy to the cloud? Would love to hear opinions! ☺️
👀 2
n
It's very common to have this iteration loop. for notebook vs script - it's not really a big problem because it's fairly easy to convert between the two. Or you can always have your notebook calling a function from your script (or library). The bigger question is local vs cloud. It's less trivial if you want to run it both locally & on the cloud. Because you also need to think about where are your data sitting at etc. The solution will also be platform specific, for Databricks, some team use dbx (a cli tool that allows you to sync a local repository to Databricks) /Databricks Repo or Databricks Connect (mostly limited to Spark workflow only)
👍 1
For deployment - Kedro is mostly platform agnostic, the experience will depends on how well your platform support, but there are different plugins (mostly community maintained) to help you to convert a Kedro pipeline to a platform specific one (i.e. Sagemaker / Azure pipeline etc, Airflow DAGs) etc
👍 1
d
This makes sense. I think one of the most difficult things when trying to plan is where code is going to be written and/or where is it going to be run (local vs cloud). All the combinations can be hard to think through and test. It would be nice if the Kedro community could have a place with examples of potential workflows being used in Azure, data bricks, AWS etc. Would probably help a lot of us advocate for Kedro in our teams stack more effectively ☺️
💯 3
j
I think this is a great idea @Dylan F , we’ll give this some thought!
👍 1