How does segmentation improve the running time of the sieve of eratosthenes?

I came across a segmented implementation of the Eratosthenes sieve, which promises is many times faster than the regular version. Can anyone explain how sharding improves runtime? Note that I want to find prime numbers in [1, b] .

He is working on this idea: (for finding primes up to 10 ^ 9)

  • First, we generate simple sieves below sqrt (10 ^ 9) that are needed to cross multiples. Then we start crossing multiples of the first prime 2 until we reach a multiple of 2> = segment_size, if that happens, we calculate the index of that multiple in the next segment using (multiple-segment_size) and store it in a separate array (next []). We then reset the fold of the next sieve tubes using the same procedure. Once we cross the multiples of all sifting primes in the first segment, we iterate over the lattice lattice and print (or count) the primes.

  • To sift the next segment, we reset the sieve array and we increase the bottom offset by segment_size. Then we start crossing multipliers again, for each sieve break we get the sieve index from the next array and we start crossing the multiples from there ...


source to share

2 answers

The segmented sieve performs all the same operations as a regular sieve, so the complexity of the OO time does not change. The difference is in memory usage. If the sieve is small enough to fit into the memory, there is no difference. As the size of the screen increases, the terrain of the link becomes a factor, so the screening process slows down. As a last resort, if the sieve does not fit in memory and must be swapped out to disk, the sieving process will be very slow. The segmented sieve keeps the memory size constant and presumably small, so all sieve calls are local, therefore fast.



Even if the sieve fits completely into RAM, the access location still matters. A C ++ implementation just for Eratosthenes only takes half a minute to sift through the first 2 ^ 32 numbers; the same implementation initializing the same sieve in small 256Kb segments (2 ^ 21 bits, which represent 2 ^ 22 numbers) takes only 8.5 seconds on my aging Nehalem with 256KB L2 cache.

The speedup from sifting in small segments decreasing the cache decreases in the higher regions of the range, as the sieve has to iterate over all factors up to sqrt (n) every time no matter how small or large the segment is. This is most noticeable for segments close to 2 ^ 64, where the small factors are 203,280,221 primes (i.e. a full 32-bit sieve).

Segmented operation still beats full screening. You can sift segments close to 2 ^ 64 to tune millions of primes per second, tens of millions per second in the lower regions. This is counting prime numbers, not raw ore. Full sieves are just not practical after 2 ^ 32 or even so, even if you have a ton of memory.



All Articles