Is anybody aware of any work done around integrating kedro c Kedro #questions

Is anybody aware of any work done around integrati...

Iñigo Hidalgo

02/28/2024, 1:01 PM

Is anybody aware of any work done around integrating kedro components like the datacatalog into Prefect blocks https://docs.prefect.io/latest/concepts/blocks/ I'm working on a personal project using prefect and deltatables to run a small ETL pipeline, and I am trying to think of ways to use the kedro datacatalog in "prefect" world, without actually writing kedro pipelines https://github.com/inigohidalgo/prefect-polygon-etl

P 1

Iñigo Hidalgo

02/28/2024, 1:08 PM

Relevant issue in my project https://github.com/inigohidalgo/prefect-polygon-etl/issues/3

Iñigo Hidalgo

02/28/2024, 1:15 PM

The prefect deployment guide in the docs seems quite thorough, but goes much further than I want to go, and doesn't actually utilize Blocks, which is the prefect component I'm trying to integrate atm

👀 1

datajoely

02/28/2024, 1:24 PM

Blocks massively postdates when we put that guide together

datajoely

02/28/2024, 1:24 PM

so if there is a new, better way of doing so we’re all ears

Iñigo Hidalgo

02/28/2024, 1:27 PM

Yeah I assumed as much. I remember reading a kedro prefect deployment guide a couple of years ago, and it was already a bit outdated by then since it was prefect 1 and prefect 2 was out already. It does seem like it was updated to prefect 2 though.

if there is a new, better way of doing so we’re all ears

I honestly don't know if there is. That guide goes towards converting pipelines to flows, but basically I want to see if there's a way to have a Block which can contain a datacatalog, which would be accessed from different flows on different machines, and be able to load data without actually needing to set up any config on that machine. It seems Blocks are just wrappers around pydantic models, so I'm not even rly sure if what I have in mind is possible/simple

Iñigo Hidalgo

02/28/2024, 1:29 PM

Ive only really just started using Prefect, so there's probably some other recommended way to do this using Prefect components directly, but since I already have a few kedro datasets developed for different purposes, I'd rather use kedro components for config if I can, and just wrap them using Prefect objects

👍 1

Nok Lam Chan

02/28/2024, 1:33 PM

The Perfect guide hasn't been updated for a while, I think on deployment we relies a lot more on the community to share their approach. I think we are in the process of reviewing Airflow (what we believed as the major open source orchestrator) The Perfect guides is updated for Perfect 2 a while ago, Blocks maybe a new thing that they released more recently?

datajoely

02/28/2024, 1:34 PM

thinking about it - I’m not sure if forcing Kedro concepts other than parameters into Pydantic like objects is a good fit, but interested to see where this goes

Iñigo Hidalgo

02/28/2024, 1:36 PM

my mind is pushing me in this direction: setup flow, run every X time from a centralized config repo: config.yml -> OmegaConfigLoader -> JSON Block runtime flow: DataCatalog.from_config(JSON Block)

datajoely

02/28/2024, 1:37 PM

Oh I see what you mean

Iñigo Hidalgo

02/28/2024, 1:37 PM

so use the blocks to store the parsed and resolved config, and then load the config and instantiate the datacatalog at runtime

datajoely

02/28/2024, 1:37 PM

interesting

datajoely

02/28/2024, 1:38 PM

I actually really want the team to build

Pipeline.from_json

and

<http://Pipeline.to|Pipeline.to>_json

which I think would help with pushing smaller bits of Kedro to orchestrators

Iñigo Hidalgo

02/28/2024, 1:39 PM

kedro-glass

👀

👀 1

datajoely

02/28/2024, 1:39 PM

never

🤣 2

datajoely

02/28/2024, 1:39 PM

you would still write python code, it would just be portable

Iñigo Hidalgo

02/28/2024, 1:40 PM

that's interesting. so basically you'd represent the pipeline's inputs/outputs as strings, and then the actual executed function as the dotted string representation?

datajoely

02/28/2024, 1:42 PM

yeah you could read in this research piece https://github.com/kedro-org/kedro/issues/3094

👍 1

Iñigo Hidalgo

02/28/2024, 1:54 PM

Interesting piece, thanks for sharing. In the project I'm working on atm I'd like to avoid using the full kedro functionality and stick to bare prefect flows and tasks so, like the deployment guide, this is much more in depth than I need, but I'll be following the thread for future updates :) Out of curiosity, have you looked into how Prefect separates code and execution? I haven't really looked much into it as I am running everything on a single machine, so I haven't needed to implement that separation yet, but from my understanding you can specify where a certain script or module is stored, eg s3, git etc, and it will pull that code on each execution.

datajoely

02/28/2024, 1:54 PM

it’s not something I new in detail

datajoely

02/28/2024, 1:55 PM

but I do put Prefect, Dagster and Flyte in the category of orchestrators I’d use if I was starting a company

👍 1

29 Views

Open in Slack

Previous Next