Hi are there any plans to make Kedro natively built for stre Kedro #questions

Hi, are there any plans to make Kedro natively bui...

Jamal Sealiti

05/20/2025, 11:54 AM

Hi, are there any plans to make Kedro natively built for streaming (spark Streaming for reading,writing,deleting and merging streaming data) without using using custom nodes and hooks?

Juan Luis

05/20/2025, 11:57 AM

hi @Jamal Sealiti! how do you solve the lack of streaming support at the moment? or put another way, what do you mean "custom nodes and hooks"?

Jamal Sealiti

05/20/2025, 11:57 AM

custom datasets

Jamal Sealiti

05/20/2025, 11:59 AM

the disadvantage that I have to write a lot of spark logic to handle streaming

Jamal Sealiti

05/20/2025, 12:00 PM

As it is, for me seeam like Kedro it's more batch-oriented

Juan Luis

05/20/2025, 12:07 PM

at the moment yes. but it's a question that comes up from time to time, so I collected some earlier examples of that https://github.com/kedro-org/kedro/discussions/4754

👍 1

Juan Luis

05/20/2025, 12:07 PM

paging @Deepyaman Datta, he's been interested in this for a while

Jamal Sealiti

05/20/2025, 12:22 PM

Then there are plans in the near future to further develop Kedro into a version that is more streaming-oriented?

Juan Luis

05/20/2025, 1:08 PM

unclear if in the "near future", but we do want to work on having more flexible I/O for Data Engineering pipelines and streaming would fall within this. we're talking about ~months

🥳 1

Jamal Sealiti

05/20/2025, 1:12 PM

Good to hear 🙂 I'm looking forward to getting this part in place. I have compared Kedro, with Beam, dbt and other frameworks and I actually liked Kedro but in terms of streaming handling, Kedro will be the perfect framework for my purposes

K 1

Deepyaman Datta

05/20/2025, 3:04 PM

@Jamal Sealiti Do you have much existing experience with streaming? Have you tried the approach in https://kedro.org/blog/kedro-dataset-for-spark-structured-streaming? I'd be curious on your thoughts of the gaps.

Then there are plans in the near future to further develop Kedro into a version that is more streaming-oriented?

I'd say, again, this depends on what it entails/what the current gaps are. IMO the biggest issue with Kedro for streaming isn't defining the logic (if you can write it in Python, you can shoehorn it into Kedro 😉 to an extent); the bigger issue is that

kedro run

(or, more generally, running at a point in time) doesn't make sense for streaming, and you need to "deploy" the streaming application. In that regard, there's been little work so far, partially because there also haven't been users looking to do true streaming work. Realistically, what you would do would probably be quite similar to what https://github.com/getindata/dbt-flink-adapter does for dbt. Another powerful approach with Kedro could be to use Ibis (with either a Flink or Spark Streaming backend), if you don't specifically want Spark Streaming.

👍 1

Jamal Sealiti

05/21/2025, 9:30 AM

@Deepyaman Datta Thank you : What I meant was that it would be nice if Kedro handled streaming without having a lot of logic in Spark/Flink for handling read, write, merge, delete, decoding of parquet files, schema validation, etc. For example, I had to create a custom dataset that reads from Kafka.

Deepyaman Datta

05/21/2025, 1:21 PM

Ah, I understand

Then there are plans in the near future to further develop Kedro into a version that is more streaming-oriented?

In that case, I highly doubt this would happen in the near future, unless it's driven by the community; Kedro is a fairly unopinionated structurer of Python code and this works fairly well for batch workflows. Agree streaming requires more things to be built in to work, and this could be a plugin or something, but I don't know how much demand there has been for this to be prioritized from the core team at this point.

👍 1

9 Views

Open in Slack

Previous Next