The Sieve of Eratosthenes has a huge "overload" - is it better for the Sundaram?

Question

The Sieve of Eratosthenes has a huge "overload" - is it better for the Sundaram?

The standard sieve of Eratosthenes crosses most of the composites several times; in fact, the only ones that don't get notes more than once are those that are the product of exactly two primes. Naturally, the overload increases as the sieve gets larger.

For an odd sieve (ie without evens), the over-issue reaches 100% for n = 3,509,227, with 1,503,868 composites and 1,503,868 intersections from the already crossed out numbers. For n = 2 ^ 32, the reissue increases to 134.25% (excess of 2,610,022,328 against the number of visits 1 944 203 427 = (2 ^ 32/2) - 203 280 821).

The Sundaram Sieve - with a different explanation at maths.org - might be a little smarter about this if - and only if - the loop limits are calculated intelligently. However, something that the sources I've seen seems to be disguised as "optimization", and it also seems that the unoptimized Sundaram is beaten only by Eratosthenes every time.

The interesting thing is that both produce exactly the same final bitmap, i.e. one where bit k corresponds to the number (2 * k + 1). So both algorithms must end up setting exactly the same bits, they just have different ways to get around this.

Does anyone have any hands-on experience with a competitive, tuned Sundaram? Can he beat the old Greek?

I minified the code for my small ratio sieve (2 ^ 32, Greek only) and adjusted the segment size to 256KB, which is optimal on the old Nehalem with 256KB L2 as on the new CPU (although the latter are much more forgiving of larger segments) ... But now I hit a brick wall and the blood sieve still takes 8.5s to initialize. Booting a sieve from a hard drive is not a very attractive option and multithreading is difficult to do in a portable manner (since libs like boost tend to put the monkey key in portability) ...

Can Sundaram shave off a few seconds from launch?

PS: Overwork as such is not a problem and will be consumed by the L2 cache. The point is that the standard Eratosthenes seems to be doing more than double the amount of work needed, indicating that there may be less time to work less.

+3

primes sieve-of-eratosthenes sieve

DarthGizka 30 oct. 14 at 15:34

source to share

1 answer

DarthGizka · Accepted Answer · 2014-11-03T13:29:56+0000

Since there were no contenders for the Sundaram vs. Eratosthenes problem, I sat down and analyzed it. Result: the classic Sundaram has a strictly higher level than only Eratosthenes; if you apply the obvious slight optimization, then the reinstallation will be the same - for reasons that will become obvious. If you fix Sundaram to avoid brute force, you end up with something like Pritchard Sieve , which is considerably more difficult.

Sundamam Sieve's Lucky Notes presentation is probably the best; slightly rewritten to use a hypothetical (i.e. non-standard and not provided here) type bitmap_t

, it looks something like this. To measure overdraw, the bitmap type requires an operation corresponding to the CPU t12 command (bit test and set), which is available via the built-in _bittestandset () with Wintel compilers and with MinGW gcc versions. Internal characteristics are very poor for performance, but very handy for counting overshoots.

Note: for sifting all primes up to N one would call a sieve with max_bit = N / 2; if bit i of the resulting bitmap is set, then the number (2 * i + 1) is composite. The function has "31" in its name because the index math breaks down into bitmaps larger than 2 ^ 31; so this code can only accept numbers up to 2 ^ 32-1 (corresponding to max_bit <= 2 ^ 31-1).

uint64_t Sundaram31_a (bitmap_t &bm, uint32_t max_bit)
{
   assert( max_bit <= UINT32_MAX / 2 );

   uint32_t m = max_bit;
   uint64_t overdraw = 0;

   bm.set_all(0);

   for (uint32_t i = 1; i < m / 2; ++i)
   {
      for (uint32_t j = i; j <= (m - i) / (2 * i + 1); ++j)
      {
         uint32_t k = i + j + 2 * i * j;

         overdraw += bm.bts(k);
      }
   }

   return overdraw;
}

Lucky tied to j

are accurate, but one is i

very loose. By tightening it up and losing the alias m

I added to make the code more like common exposures on the web, we get:

uint64_t Sundaram31_b (bitmap_t &bm, uint32_t max_bit)
{
   uint32_t i_max = uint32_t(std::sqrt(double(2 * max_bit + 1)) - 1) / 2;
   uint64_t overdraw = 0;

   bm.set_all(0);

   for (uint32_t i = 1; i <= i_max; ++i)
   {
      for (uint32_t j = i; j <= (max_bit - i) / (2 * i + 1); ++j)
      {
         uint32_t k = i + j + 2 * i * j;

         overdraw += bm.bts(k);
      }
   }

   return overdraw;
}

assert

was dropped to reduce noise, but in fact it is still valid and necessary. Now is the time to reduce the force by making the multiplication an iterated addition:

uint64_t Sundaram31_c (bitmap_t &bm, uint32_t max_bit)
{
   uint32_t i_max = uint32_t(std::sqrt(double(2 * max_bit + 1)) - 1) / 2;
   uint64_t overdraw = 0;

   bm.set_all(0);

   for (uint32_t i = 1; i <= i_max; ++i)
   {
      uint32_t n = 2 * i + 1;
      uint32_t k = n * i + i;   // <= max_bit because that how we computed i_max
      uint32_t j_max = (max_bit - i) / n;

      for (uint32_t j = i; j <= j_max; ++j, k += n)
      {
         overdraw += bm.bts(k);
      }
   }

   return overdraw;
}

Converting the loop condition to use k

allows us to lose j

; things should look extremely familiar by now ...

uint64_t Sundaram31_d (bitmap_t &bm, uint32_t max_bit)
{
   uint32_t i_max = uint32_t(std::sqrt(double(2 * max_bit + 1)) - 1) / 2;
   uint64_t overdraw = 0;

   bm.set_all(0);

   for (uint32_t i = 1; i <= i_max; ++i)
   {
      uint32_t n = 2 * i + 1;     
      uint32_t k = n * i + i;   // <= max_bit because that how we computed i_max

      for ( ; k <= max_bit; k += n)
      {        
         overdraw += bm.bts(k);
      }
   }

   return overdraw;
}

Given how they do it, it is time to analyze whether mathematical confirmation of some of the obvious small changes is warranted. The proof remains as an exercise for the reader ...

uint64_t Sundaram31_e (bitmap_t &bm, uint32_t max_bit)
{
   uint32_t i_max = unsigned(std::sqrt(double(2 * max_bit + 1)) - 1) / 2;
   uint64_t overdraw = 0;

   bm.set_all(0);

   for (uint32_t i = 1; i <= i_max; ++i)
   {
      if (bm.bt(i))  continue;

      uint32_t n = 2 * i + 1;     
      uint32_t k = n * i + i;   // <= m because we computed i_max to get this bound

      for ( ; k <= max_bit; k += n)
      {        
         overdraw += bm.bts(k);
      }
   }

   return overdraw;
}

The only thing that still differs from the classic odds is that only Eratosthenes (other than the name) are the initial value for k

, which is usually (n * n) / 2

for Old Greek. However, replacing 2 * i + 1

with n

, the difference turns out to be equal to 1/2, which is rounded to 0. Therefore, Sundaram is just Eratosthenes without "optimization" of transmissive composites to avoid at least some transitions from the already crossed out numbers. The value for is the i_max

same as the Greek max_factor_bit

just obtained using completely different logical steps and is calculated using a slightly different formula.

PS: after looking overdraw

through the code so many times people probably want to know what it really is ... Sifting the numbers down to 2 ^ 32-1 (i.e. a complete bitmap of 2 ^ 31 bits) Sundaram has a surplus of 8 643 678 027 (approximately 2 * 2 ^ 32) or 444.6% ; with a slight tweak that turns it into the odds of only Eratosthenes, the overrun becomes 2,610,022,328 or 134.2%.

The Sieve of Eratosthenes has a huge "overload" - is it better for the Sundaram?

More articles: