About Spark UnsafeShuffleWriter

I have two questions about the UnsafeShuffleWriter UnsafeShuffleWriter

will be used when all three of the following conditions are met:

A shuffle dependency does not indicate an aggregation or output ordering.
The shuffle serializer supports wrapping of serialized values (currently supported by the special serializers KryoSerializer and Spark SQL).
Shuffle produces less than 16,777,216 output sections.

I am confused about the first two conditions.

Why would a random permutation not indicate an ordering or an output ordering? I find it a good idea to use UnsafeShuffleWriter

if mapSideCombine = false, regardless of whether aggregation or ordering is specified.
Why does the serializer have to support wrapping the serialized values where movement will be used?

+3

scala serialization apache-spark

zcbzfl Apr 22. 17 at 6:52

source to share

No one has answered this question yet

Check out similar questions:

15

How do I split parquet files into multiple sections in Spark?

4

Efficient PairRDD Operations on DataFrame with Spark SQL GROUP BY

4

Divide the Spark chunk into some column values and then rotate each generated data frame independently of the others

4

Spark Scala: pivot DataFrame column values into an ordered list

2

How do I read the Spark SQL toDebugString output?

2

Cogrouped Apache Spark DAG operation

1

Spark is running a task with insufficient concurrency

1

Spark: subtract data but keep duplicate values

0

Spark RDD Aggregation / Dump Business Scenario

-1

Transpose Spark DataFrame Aggregation in Array

All Articles