Hi all, I am new to kedro and have gone thru intr...
# questions
s
Hi all, I am new to kedro and have gone thru intro docs so this is probably a dumb question. Most of the examples in the docs are with static datasets (csv, sql etc) but don’t see any examples for ticking datasets. Something that explains how the pipeline will work if the datacatalog has a realtime event stream source, eg Kafka etc. Can someone direct me to docs or examples?
h
Someone will reply to you shortly. In the meantime, this might help:
d
https://medium.com/quantumblack/kedro-goes-streaming-34e1094c354c But, in general, streaming workloads are less well supported with Kedro.
s
@Deepyaman Datta Just went thru above link. So seems Kedro is more geared toward batch vs stream. Given I am still evaluating Kedro and similar tools, is there any other framework you would recommend which supports both batch and stream natively?
Or any plans to support Ray?
m
Hi @Samrat, we don't have any official plans but we have this issues open about documenting Ray with Kedro: https://github.com/kedro-org/kedro/issues/479 Feel free to add a comment there so we can track interest.
d
@Samrat what functionality are you looking for? Most pipelining frameworks and orchestrators don't really support streaming, because they're focused on materializing things when requested, not continuously. Do you basically just want something to organize batch + streaming code, or do you want to handle deployment?
s
@Deepyaman Datta I am looking to organize batch + streaming code and be able to deploy in prod
d
IMO technically the issue with using something like Kedro isn't that it doesn't work for organizing streaming code; it's that
kedro run
doesn't really make sense (unless you want to stream outputs to console, which is not what you do in prod). So you can use Kedro to organize whatever streaming code—Spark Streaming, PyFlink, Ibis, etc.—but you would need to implement a
deploy
command that isn't there out of the box.