Hi all I am new to kedro and have gone thru intro docs so th Kedro #questions

Hi all, I am new to kedro and have gone thru intr...

Samrat

03/13/2025, 1:16 AM

Hi all, I am new to kedro and have gone thru intro docs so this is probably a dumb question. Most of the examples in the docs are with static datasets (csv, sql etc) but don’t see any examples for ticking datasets. Something that explains how the pipeline will work if the datacatalog has a realtime event stream source, eg Kafka etc. Can someone direct me to docs or examples?

Hall

03/13/2025, 1:16 AM

Someone will reply to you shortly. In the meantime, this might help:

Deepyaman Datta

03/13/2025, 1:42 AM

https://medium.com/quantumblack/kedro-goes-streaming-34e1094c354c But, in general, streaming workloads are less well supported with Kedro.

Samrat

03/13/2025, 2:01 AM

@Deepyaman Datta Just went thru above link. So seems Kedro is more geared toward batch vs stream. Given I am still evaluating Kedro and similar tools, is there any other framework you would recommend which supports both batch and stream natively?

Samrat

03/13/2025, 4:19 AM

Or any plans to support Ray?

Merel

03/13/2025, 7:59 AM

Hi @Samrat, we don't have any official plans but we have this issues open about documenting Ray with Kedro: https://github.com/kedro-org/kedro/issues/479 Feel free to add a comment there so we can track interest.

Deepyaman Datta

03/13/2025, 2:37 PM

@Samrat what functionality are you looking for? Most pipelining frameworks and orchestrators don't really support streaming, because they're focused on materializing things when requested, not continuously. Do you basically just want something to organize batch + streaming code, or do you want to handle deployment?

Samrat

03/13/2025, 4:03 PM

@Deepyaman Datta I am looking to organize batch + streaming code and be able to deploy in prod

Deepyaman Datta

03/13/2025, 4:06 PM

IMO technically the issue with using something like Kedro isn't that it doesn't work for organizing streaming code; it's that

kedro run

doesn't really make sense (unless you want to stream outputs to console, which is not what you do in prod). So you can use Kedro to organize whatever streaming code—Spark Streaming, PyFlink, Ibis, etc.—but you would need to implement a

deploy

command that isn't there out of the box.

7 Views

Open in Slack

Previous Next