For now, is apache spark the only way to feed live...
# questions
k
For now, is apache spark the only way to feed live data to kedro?
j
hi @Kevin Kim, what do you mean with live data?
what other systems do you have in mind?
k
I'm not yet noticed of what type of data I will be fed with. By live data - I mean datas that are constantly updated by machines, like sensor datas(for example).
Another example could be real time stock price, I guess?
j
I see - Kedro is inherently a batch system, so you could simulate streaming processing by using micro-batch (hence performing a
kedro run
repeatedly with small batches). in that sense, any input will work - but for now, the only streaming dataset is the
SparkStreamingDataset
described in https://kedro.org/blog/kedro-dataset-for-spark-structured-streaming we notably lack a connector for, say, Kafka streams
@Deepyaman Datta did some experiments on https://github.com/deepyaman/kedro-streaming maybe he can comment more 🙂
k
Oh I've seen both the links, sounds great. Thanks!
🙌🏼 1
🙌 1
d
@Kevin Kim are you looking to use something in particular (if not Spark)? My team and I have been working on some stuff that should, in theory, make it possible to use Flink with something like Kedro (not that we've actually tried using Kedro). If you'd like to share more about your use case, either here or privately, I'd be happy to see if it's something could help enable.
👍 2
👀 1
n
What's the use case of your application? Having live data ! = steaming, it's very unlikely you need real time pipeline. You may want something like kafka to stream the data but only process the data in a larger batch later.
👍 1