One performer works much longer than everyone else in the spark streaming task

enter image description here

I integrate sparking with kafka, in one of the stages one performer is much slower than the other.

you can find in the picture, h10.zw runs 2.6 minutes, and the "task time" is 52 minutes, which is much longer than other performers. But the size of the record in the shuffle size / shuffle format is the same as the others.

Wondering what "task time" is? What does the performer h10.zw do? How to balance the working hours of all performers to avoid time skew?

+3


source to share


1 answer


This may, depending on your precise processing, be caused by a data failure . Try speculative execution and change your section into smaller sections . This should help determine if this is the case.



+1


source







All Articles