Spark concatenates a connection between two partitioned data frames
For the next connection between the two DataFrames
in Spark 1.6.0
val df0Rep = df0.repartition(32, col("a")).cache
val df1Rep = df1.repartition(32, col("a")).cache
val dfJoin = df0Rep.join(df1Rep, "a")
println(dfJoin.count)
Does it connect not only co-sharing, but co-location as well? I know for RDDs, when using the same delimiter and shuffle in the same operation, the join will be co-located. But what about data? Thank.
+3
source to share
No one has answered this question yet
Check out similar questions: