Why is the value of hardcode reallocation

Question

Why is the value of hardcode reallocation

Looking at the example spark code I can see that the numbers in the reallocation or concatenation are hardcoded:

val resDF = df.coalesce(16)

what is the best approach to manage this setting when this hardcoded value becomes irrelevant, when the cluster can update dynamically in a matter of seconds.

+3

distributed-computing apache-spark partition

xstack2000 May 09 '17 at 13:16

source to share

1 answer

gsamaras · Accepted Answer · 2017-05-09T14:39:09+0000

Well, there are usually hardcoded values in the examples, so you don't need to worry, I mean you can change this example. I mean the section documentation is filled with hardcoded values, but those values are just examples.

The rule of thumb about the number of sections:

its RDD would need to have as many partitions as the number of executors times the number of cores used by 3 (or possibly 4). Of course, this is a heuristic and it really depends on your application, dataset, and cluster configuration.

Note, however, that the reallocation does not come for free, so in a dynamically dynamic environment you need to be sure that the overhead of reallocation is negligible in relation to the profit you will receive from this operation.

Coalesce and repartition can have different costs, which I mentioned in my answer.

Why is the value of hardcode reallocation

More articles: