Hi everyone, Has anyone used Dagster and Kedro in ...
# questions
j
Hi everyone, Has anyone used Dagster and Kedro in combination? I'm considering using them together and would like some insights. I know there is a side project, the kedro-dagster plugin repo, which is in its early stages of development, but it doesn't have comprehensive documentation yet, and I'm unsure about the integration process.
h
Someone will reply to you shortly. In the meantime, this might help:
d
To the best of my knowledge, @Guillaume Tauzin and @FlorianGD are the ones who have done the most work in exploring this. I am also very excited about using them together (naturally, as a Kedro maintainer and Dagster employee), and think they would be great in combination, but I haven't had the chance to try using them together yet. If you have specific questions, I'm more than happy to try and find answers. If it's an important piece of the puzzle for your company to use Dagster in a bigger way, I can also check whether it's something a Dagster employee who works more on the integrations side may be able to weigh in on!
j
Have a look at https://github.com/gtauzin/kedro-dagster by @Guillaume Tauzin
d
@Juan Luis pretty sure he has 🙂
I know there is a side project, the kedro-dagster plugin repo, which is in its early stages of development, but it doesn't have comprehensive documentation yet, and I'm unsure about the integration process.
j
oops 🙊
g
Hi @Jakub Szafranski. Indeed, it's still a WIP. I am happy to discuss the integration process and get some feedback. Also, on my side, it would be interesting to understand your expectations from this kedro/dagster combination.
K 1
j
Thank you all so much for your responses! I really appreciate the time and effort you put into helping me out. 🙂 I am currently facing a dilemma and would love to get your thoughts on it. Here’s the situation: I am considering whether to use Kedro + Dagster or just Dagster alone. We haven't written our pipelines in either of the tools yet, so we are not tied to any specific solution. Kedro seems simpler and has a more straightforward syntax. However, I am concerned that using Kedro + Dagster might lead to losing some crucial functionalities from Dagster that we might not be aware of yet. Additionally, I worry that maintaining and learning both Kedro and Dagster could be more challenging and time-consuming in the long run. Despite Kedro's simpler syntax, the combined effort of learning and integrating both technologies might outweigh the benefits. On the other hand, focusing solely on Dagster might make our lives easier in the long run, as we would avoid the complexities of combining the two tools and fully leverage Dagster's capabilities. Given these points, what would you recommend? Is it better to use Kedro + Dagster, or should we stick with just Dagster? What are the key factors we should consider in making this decision?
g
Are you asking more generally why would one use Kedro+an orchestrator instead of just using the orchestrator directly or is it specific to Dagster? I can share my perspective as a user but I guess the Kedro team has many more interesting things to say! :) But maybe, first, a bit of context would help: • How big is your organization/team? • What kind of DS project do you work on? • Is your organization already using Dagster? • Why did you/they pick Dagster?
j
I'm asking specifically about Dagster. Unfortunately, I can't provide specific details about our organization or project 😅. Thanks for your response and your willingness to help! Guess I'll reach out to the Kedro team directly for more information.
g
From my own experience trying to build kedro-dagster, I do not see yet much limitations in defining pipelines or data assets directly in kedro and having them translated into Dagster jobs, assets, and IO managers. I cannot say for sure there is none but in kedro-dagster, I am trying to translate all kedro concepts (datasets, pipelines, configuration, hooks, etc...) into dagster without limiting dagster. Concretely, this means that a kedro-dagster
definitions.py
would import all kedro-translated dagster objects and the user would be free to edit them or add things like schedule before passing everything to the Dagster
Definitions
at the end of the file. As the project is still early stage, I can't tell you for sure that a user will be able to fully leverage Dagster's capabilities. But this is the ultimate objective (and any help is welcome!). To me there are several advantages to using kedro on top of dagster. To cite just a few: • It is very easy to create new data asset and pipelines in kedro, it is much harder to do so in dagster. Kedro doc, tutorials and slack community are all extremely high quality. Typically, data scientists are not experts in software engineering, so working directly with dagster can be daunting. • Kedro structures well your DS project and handles configuration and data connectors definition. This takes out a lot of the complex side of working on a DS project. • Kedro is orchestrator-independant, so if you decide to change later on because it no longer match your need, you don't have to rewrite everything.
❤️ 1
1
1000000 1
j
Thank you so much for your detailed response and insights! I really appreciate the time and effort you put into helping me 🙂 Helped a lot!
👍 1