For now is apache spark the only way to feed live data to ke Kedro #questions

Join Slack

For now, is apache spark the only way to feed live...

# questions

Kevin Kim

10/25/2023, 6:38 AM

For now, is apache spark the only way to feed live data to kedro?

Juan Luis

10/25/2023, 6:53 AM

hi @Kevin Kim, what do you mean with live data?

Juan Luis

10/25/2023, 6:53 AM

what other systems do you have in mind?

Kevin Kim

10/25/2023, 6:54 AM

I'm not yet noticed of what type of data I will be fed with. By live data - I mean datas that are constantly updated by machines, like sensor datas(for example).

Kevin Kim

10/25/2023, 6:56 AM

Another example could be real time stock price, I guess?

Juan Luis

10/25/2023, 6:58 AM

I see - Kedro is inherently a batch system, so you could simulate streaming processing by using micro-batch (hence performing a

kedro run

repeatedly with small batches). in that sense, any input will work - but for now, the only streaming dataset is the

SparkStreamingDataset

described in https://kedro.org/blog/kedro-dataset-for-spark-structured-streaming we notably lack a connector for, say, Kafka streams

Juan Luis

10/25/2023, 6:58 AM

@Deepyaman Datta did some experiments on https://github.com/deepyaman/kedro-streaming maybe he can comment more 🙂

Kevin Kim

10/25/2023, 6:59 AM

Oh I've seen both the links, sounds great. Thanks!

🙌🏼 1

🙌 1

Deepyaman Datta

10/25/2023, 7:49 AM

@Kevin Kim are you looking to use something in particular (if not Spark)? My team and I have been working on some stuff that should, in theory, make it possible to use Flink with something like Kedro (not that we've actually tried using Kedro). If you'd like to share more about your use case, either here or privately, I'd be happy to see if it's something could help enable.

👍 2

👀 1

Nok Lam Chan

10/25/2023, 9:22 AM

What's the use case of your application? Having live data ! = steaming, it's very unlikely you need real time pipeline. You may want something like kafka to stream the data but only process the data in a larger batch later.

👍 1

Open in Slack

Previous Next