Why does the task size keep growing when using updateStateByKey?

I wrote a simple function to use with updateStateByKey to see if the problem was due to my updateFunc. I think it must be because of something else. I am running this on --master local [4].

val updateFunc = (values: Seq[Int], state: Option[Int]) => {
  Some(1)
}

val state = test.updateStateByKey[Int](updateFunc)

      

After a while, warnings appear, the size of the task increases.

WARN TaskSetManager: Phase x contains a very large task size (129 KB). The maximum recommended task size is 100 KB.

WARN TaskSetManager: Phase x contains a very large task size (131 KB). The maximum recommended task size is 100 KB.

+3


source to share


1 answer


You have more and more individual keys coming out of your stream, each one resulting in a new copy being added 1

to your state.

The current updateStateByKey scans every key in every burst interval, even if there is no data for that key. This causes the updateStateByKey to take longer to batch process with the number of keys in the state, even if the baud rate remains fixed.



There is a proposal to solve this problem .

0


source







All Articles