Spark concatenates a connection between two partitioned data frames

Question

Spark concatenates a connection between two partitioned data frames

For the next connection between the two DataFrames

in Spark 1.6.0

val df0Rep = df0.repartition(32, col("a")).cache
val df1Rep = df1.repartition(32, col("a")).cache
val dfJoin = df0Rep.join(df1Rep, "a")
println(dfJoin.count)

Does it connect not only co-sharing, but co-location as well? I know for RDDs, when using the same delimiter and shuffle in the same operation, the join will be co-located. But what about data? Thank.

+3

join scala apache-spark apache-spark-sql spark-dataframe

harryNYC 23 Mar 17 at 19:21

source to share

No one has answered this question yet

Check out similar questions:

4331

What is the difference between "INNER JOIN" and "OUTER JOIN"?

1562

What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?

894

Difference between JOIN and INNER JOIN

217

Difference between left join and right join in SQL Server

99

How to determine the splitting of a DataFrame?

five

Number of sections in a Spark frame

five

Partition data for efficient connection for Spark framework / dataset

2

PySpark concatenates shuffled jointly split RDDs

1

DataFrame --- join / groupBy-agg - section

0

Spark DataFrame Repartition and Parquet Partition

Spark concatenates a connection between two partitioned data frames

More articles: