Is structured streaming real-time?

We know that Flink is truly a real-time streaming processing engine that can only process records when they arrive, and we also know that sparking is a micro-batch download streaming processing engine.

However, we also know that the spark has released a structured stream, how about that? Is this really a real streaming processor just like Flink that can deal with recording right away when it arrives instead of micro-batch, or is it still using micro-batch mode?

+3


source to share


2 answers


Is structured real-time streaming an engine?

TL; DR No. Or yes. Depends on the definition of "real-time stream processing engine".

Prior to 2.3.0-SNAPSHOT (current master ), Structured Streaming uses micropackages and nothing seems to suggest it will be different in future releases.

A deep dive into the built streaming streaming streaming engine

StreamExecution (runtime for streaming request) starts a separate thread of execution that checks for new records.



Once started microBatchThread

(which is a regular Java object java.lang.Thread

), it executes runBatches , which starts the execution of each trigger interval .

As you walk through the code, you can see the internal execution engine for the streaming requests it makes for each trigger.

I understand that nothing has changed in terms of micro-dosage. This was similar to Spark Streaming and is also used in Structured Streaming.


Shameless plugin: you can explore the topic in more detail by reading my gitbook on Structured Streaming , which I am writing for this very purpose, to understand the lowest level details. Comments are welcome.

+5


source


+2


source







All Articles