Number of baffles required for a spark

in the 2013 spark summit, one of yahoo's presentations mentioned this formula -

Partitions required = total data size / (memory size / number of cores)

assuming 64GB memory 16 cpu cores in the presentation mentioned to handle 3Tb of data, the number of partitions required is 46080. I am having a hard time getting to the same result. Please explain how the number 46080 is calculated?

+3


source to share


1 answer


Looking at the presentation here , the following information is available:

  • 64GB memory host
  • 16-core processor
  • Compression ratio 30: 1, 2 times overhead

Your formula must use the size of the uncompressed data when calculating, so in this case, you need to unpack it first.



Data size = 3Tb * 30 * 2 = 180Tb = 184320Gb

Running through the formula you get: 184320Gb / (64Gb / 16) = 46080 partitions

+2


source







All Articles