Number of baffles required for a spark
in the 2013 spark summit, one of yahoo's presentations mentioned this formula -
Partitions required = total data size / (memory size / number of cores)
assuming 64GB memory 16 cpu cores in the presentation mentioned to handle 3Tb of data, the number of partitions required is 46080. I am having a hard time getting to the same result. Please explain how the number 46080 is calculated?
+3
source to share
1 answer
Looking at the presentation here , the following information is available:
- 64GB memory host
- 16-core processor
- Compression ratio 30: 1, 2 times overhead
Your formula must use the size of the uncompressed data when calculating, so in this case, you need to unpack it first.
Data size = 3Tb * 30 * 2 = 180Tb = 184320Gb
Running through the formula you get: 184320Gb / (64Gb / 16) = 46080 partitions
+2
source to share