Hi, I'm trying to compare Kedro to similar framewo...
# user-research
a
Hi, I'm trying to compare Kedro to similar framework - flyte. Do you maybe know it and could point out what are main differences/design principles differences? From what I've observed: • they don't handle data abstractions/loading well through data catalog • they try to support dynamic pipelines - also introduce branching in pipelines (I'm getting airflow dag/orchestration vibes here) - I know the topic of dynamic pipelines in Kedro, but I wonder what's the deal with branching (dynamic evaluated conditions that say that part of pipeline won't execute). Is it also a problem of dynamic pipelines? From Kedro's perspective I think that it would be difficult to do (one could hide that logic inside a node or nodes at start to check some arg, but it's also problematic because you can't disable Kedro's outputs easily by returning Nones). The gut feeling tells me that I've never needed this functionality because Kedro does not try to be an orchestrator with complex logic - rather the tool that uses pipelines defined in Kedro could utilize such logic. • in flyte most of config is local to the code (params of decorators) which worries me that could become more messy, less reusable (without extra effort) - compared to more centralized approach with configs and pipelines definitions in Kedro • flyte's tasks are designed with distribution and parallelism in mind as default approach with local run more as a bonus, while in Kedro it's similar, but I feel like parallelism is more tricky, while local & sequential is best supported. Maybe it only looks good on the first glance at flyte and also looks a bit more ugly when you get down to it. • they seem to have a bit more functionality in the browser/hub with data vizualisations, run time break down/history, docstring parsing and browsing pipeline/nodes with it in UI, while kedro-viz is slowly catching up here with the features & we rely on MLflow being a great addition/help with all of that • flyte is written in golang and multi-langual, while Kedro is python native and only • they integrate with about the same amount of services as Kedro does with plugins, but they also implement agents for the services... I don't really know why/what for - probably some technicality & orchestration related • flyte seems to be designed around kubernetes, while Kedro seems designed around python • Kedro has great support for configs - loading/versioning/templating capabilities, while I don't see any support for that in flyte • flyte seems more complex to understand, with higher learning curve - I don't see also any starters to plug & play so far • Kedro gives your code/projects common & clear structure, while flyte is more freestyle which also does not aid in readability This list got longer than I intended. Please share your thoughts about these points ^, especially whether I got anything wrong about Kedro. I'm also interested in all opinions of people that know anything more about flyte and would like to share how it compares.
👀 2
d
Sorry, I meant to respond to this earlier and forget. I think the main difference is that Flyte is an orchestrator, whereas Kedro is not (despite it regularly featuring on lists of top Python orchestrators 😅). To the best of my understanding, each Flyte task runs in a separate container when deployed (much like with Kubeflow Pipelines, Prefect, etc.). While the docs get you started with trivial examples, you likely don't want to spin up a container for each line of pandas code, and the chunk of logic that belongs to a task should be determined around your desired deployment. To this end, Flyte could be another deployment target for Kedro (Kedro is not opinionated about orchestration, at least at this time). I think this is further supported by the fact that Flyte compares themselves to Airflow, Kubeflow, etc.
👍 1
I think the most Kedro-esque frameworks are dbt core (for SQL) and Hamilton (which calls itself a micro-orchestration framework from what I recall, to distinguish from stuff like Prefect, Dagster, etc.).
s
Can Flyte create and start Vertex AI / Kubeflow pipelines without requiring a server or an agent in the cloud? A killer feature of Kedro for me is the ability to work locally and then push a Kubeflow pipeline to execute in the cloud. Prefect can not do that, it requires you to host a server, either on your infra or using Prefect cloud.
a
I don't see any integrations with vertex, there is one with sagemaker though. I'm not sure how it works but I think it's working with agents. It requires set up k8s cluster and works with aws sagemaker k8s operator to make the connection - at this point I'm not sure why would you want to integrate it in such way. https://docs.flyte.org/en/latest/deployment/plugins/aws/sagemaker.html
and to be clear: I'm no flyte advocate, I just want to know how kedro compares to other similar tools/frameworks - in this thread flyte in particular 🙂
I've recently come across this Hamilton as well, but it seems kinda primitive, especially lacking on the visualisation of pipelines end
d
I've recently come across this Hamilton as well, but it seems kinda primitive, especially lacking on the visualisation of pipelines end
Agreed 🙂 They strive to be lighter-weight than Kedro from what I've heard, but it mostly feels like less mature functionality, in my (obviously likely biased) view