Spark concatenates a connection between two partitioned data frames

For the next connection between the two DataFrames

in Spark 1.6.0

val df0Rep = df0.repartition(32, col("a")).cache
val df1Rep = df1.repartition(32, col("a")).cache
val dfJoin = df0Rep.join(df1Rep, "a")
println(dfJoin.count)

      

Does it connect not only co-sharing, but co-location as well? I know for RDDs, when using the same delimiter and shuffle in the same operation, the join will be co-located. But what about data? Thank.

+3
join scala apache-spark apache-spark-sql spark-dataframe


source to share


No one has answered this question yet

Check out similar questions:

4331
What is the difference between "INNER JOIN" and "OUTER JOIN"?
1562
What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?
894
Difference between JOIN and INNER JOIN
217
Difference between left join and right join in SQL Server
99
How to determine the splitting of a DataFrame?
five
Number of sections in a Spark frame
five
Partition data for efficient connection for Spark framework / dataset
2
PySpark concatenates shuffled jointly split RDDs
1
DataFrame --- join / groupBy-agg - section
0
Spark DataFrame Repartition and Parquet Partition



All Articles
Loading...
X
Show
Funny
Dev
Pics