Why I only see one spark flow kafkaReceiver

I'm confused as to why I only see one KafkaReceiver on the spark UI page (8080), But I have 10 sections in Kafka and I used 10 cores in the spark cluster, also my code looks like this in python: kvs = KafkaUtils.createStream (ssc, zkQuorum, "spark-streaming-consumer", {topic: 10}) I suppose the KafkaReceivers number should be 10, not 1. I'm quite puzzled. thank you in advance!

+4


source to share


1 answer


kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer",{topic: 10})

      

This code creates 1 receiver with 10 streams. Each thread will stick to one partition and all data will be pushed by 1 consumer using 1 core. All other cores will (potentially) process the received data.

If you want to have 10 receivers, each connected to 1 partition using 1 core, you should do this: (in Scala, my Python is weak, but you get the idea):



val recvs = (1 to 10).map(i => KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer",{topic: 1}) 
val kafkaData = ssc.union(recvs)

      

Note that you need additional Spark cores to process the received data.

+5


source







All Articles