How many servers are listening during Spark Streaming?

I am reviewing my cluster configuration and I would like to harden security by minimizing how many machines can actually access the HTTP protocols.

So my question is, when doing Spark streaming (say via a twitter feed), is the driver the only server listening on that stream and then redistributing the data to the executors as an RDD, or each executor listener to the thread?

+3


source to share


1 answer


Spark Streaming will issue long-term tasks for each recipient created during Spark Streaming configuration. These receivers are distributed to some node on the cluster.

If you want to specify the host where each receiver is instantiated, you will have to extend the Receiver implementation and implement



def preferredLocation: Option[String]

      

+4


source







All Articles