Spark concatenates a connection between two partitioned data frames

For the next connection between the two DataFrames

in Spark 1.6.0

val df0Rep = df0.repartition(32, col("a")).cache
val df1Rep = df1.repartition(32, col("a")).cache
val dfJoin = df0Rep.join(df1Rep, "a")
println(dfJoin.count)

      

Does it connect not only co-sharing, but co-location as well? I know for RDDs, when using the same delimiter and shuffle in the same operation, the join will be co-located. But what about data? Thank.

+3


source to share





All Articles