What is the meaning of "Stages" in Spark UI for streaming scripts

I am working on Spark Streaming and am trying to monitor and improve performance for streaming applications. But I am confusing the following questions.

  • What is the value for each stage on the Spark Portal for the "Spark Streaming" application.
  • Not all Transforms map to tasks. And how to customize the "Transformation" for the associated tasks.

Stream code snapshot :

val transformed = input.flatMap(i => processInput(i))
val aggregated = transformed.reduceByKeyAndWindow(reduce(_, _), Seconds(aggregateWindowSizeInSeconds), Seconds(slidingIntervalInSeconds))
val finalized = aggregated.mapValues(finalize(_))
finalized

      

(Only Flatmap pages happened on the portal.)

Spark Streaming Portal Spark Streaming, Spark Portal

Thank,

Tao

+3


source to share


1 answer


Spark takes individual commands from your source and then optimizes them into a plan for tasks that will run on the cluster. One example of such an optimization is map-fusion : two map calls appear, and one map task is issued. Stage is a higher level boundary between task groups, defined so that you need to shuffle to cross that boundary.

So:



  • each of the operators your RDD calls lead to actions and transformations.
  • This leads to DAG operators.
  • The DAG is compiled into stages.
  • Each stage is performed as a series of tasks.
+1


source







All Articles