About Spark UnsafeShuffleWriter

I have two questions about the UnsafeShuffleWriter UnsafeShuffleWriter

will be used when all three of the following conditions are met:

  • A shuffle dependency does not indicate an aggregation or output ordering.
  • The shuffle serializer supports wrapping of serialized values โ€‹โ€‹(currently supported by the special serializers KryoSerializer and Spark SQL).
  • Shuffle produces less than 16,777,216 output sections.

I am confused about the first two conditions.

  • Why would a random permutation not indicate an ordering or an output ordering? I find it a good idea to use UnsafeShuffleWriter

    if mapSideCombine = false, regardless of whether aggregation or ordering is specified.
  • Why does the serializer have to support wrapping the serialized values โ€‹โ€‹where movement will be used?
+3


source to share





All Articles