Controlling large patches of primes? (for confirmation)

Are there any clever algorithms for calculating high quality checksums for millions or billions of primes? That is, with maximum error detection capability and possibly segmented?

Motivation:

Small strokes - up to 64 bits in size - can be sieved on demand at a million per second, using a small raster map to sift through potential factors (up to 2 ^ 32-1) and a second raster to sift through a number in the target range.

The algorithm and implementation are simple and straightforward enough, but the devil is in the details: values ​​tend to push or exceed the limits of built-in integral types everywhere, edge cases abound (so to speak), and even floating point strictness differences can break if programming is not sufficiently secure. Not to mention the chaos that an optimizing compiler can wreak, even on already compiled, already verified code in a static lib (if link-time code generation is used). Not to mention, faster algorithms tend to be much more complex and therefore even more fragile.

This has two implications: the test results are mostly meaningless unless the tests are run using the final executable image, and it is highly desirable to verify that it works correctly at runtime during normal use.

Checking for pre-calculated values ​​will give the highest degree of confidence, but the files required are large and clunky. A text file with 10 million primes is in the order of 100 MB uncompressed and compressed over 10 MB; it takes one byte per barcode to store byte encodings, and entropy encoding can reduce the size to half at best (5MB for 10 million primes). Therefore, even a file that only covers small factors up to 2 ^ 32 will weigh about 100 MB, and the complexity of the decoder will exceed the complexity of the window sieve itself.

This means that file validation is not possible, other than a final release check for the newly created executable. Not to mention, trusted files aren't easy to find. Prime Pages offer files for the first 50 million primes and even the amazing primos.mat.br is only up to 1,000,000,000,000. This is unfortunate since many edge cases (== needed for testing) happen between 2 ^ 62 and 2 ^ 64-1.

This leaves checksums. Thus, the space requirements will be negligible and will only be proportional to the number of test cases. I don't want to require a decent checksum like MD5 or SHA-256, and with target numbers that are basic, it should be possible to generate a high quality, high resolution checksum with some simple number operations on their own.

This is what I have come up with so far. The original digest consists of four 64-bit numbers; at the end it can be folded to the desired size.

   for (unsigned i = 0; i < ELEMENTS(primes); ++i)
   {
      digest[0] *= primes[i];              // running product (must be initialised to 1)
      digest[1] += digest[0];              // sum of sequence of running products
      digest[2] += primes[i];              // running sum
      digest[3] += digest[2] * primes[i];  // Hornerish sum
   }

      

With two (independent) muls on the calculation, the speed is quite decent, and with the exception of a simple sum, each of the components always found all the errors that I tried to get past the digest. However, I am not a mathematician and empirical testing is not a guarantee of effectiveness.

Are there some math properties that can be used for design rather than "cook" like I did - a reasonable, reliable checksum?

Is it possible to design the checksum in such a way that it is stepped, in the sense that the subranges can be processed separately and then the results are combined with a little arithmetic to give the same result as the entire range was a checksum in one go? The same as all modern CRC implementations, as a rule, have now in order to ensure parallel processing.

EDIT The rationale for the current scheme is that count, sum, and product are independent of the order in which the primes are added to the digest; they can be calculated on separate blocks and then combined. The checksum depends on the order; this is its meaning. However, it would be nice if the two checksums of two consecutive blocks could be combined one way or another to give the checksum of the combined block.

Sometimes the count and sum can be checked against external sources, such as certain sequences on oeis.org , or against sources such as batches of 10 million primes in primos.mat.br (the index gives the first and last prime, the number is implied == 10 million). However such luck for the product and checksum.

Before I throw in the main time and computational power when calculating and validating digests covering the entire range of small factors up to 2 ^ 64, I'd like to hear what the experts think about it ...

The circuit I'm currently testing in 32-bit and 64-bit variants looks like this:

template<typename word_t>
struct digest_t
{
   word_t count;
   word_t sum;
   word_t product;
   word_t checksum;

   // ...

   void add_prime (word_t n)
   {
      count    += 1;
      sum      += n;
      product  *= n;
      checksum += n * sum + product;
   }
};

      

This has the advantage that the 32-bit digest components are equal to the lower half of the corresponding 64-bit values, which means that only 64-bit digests have to be computed, even if a fast 32-bit check is required. A 32-bit version of the digest can be found in this simple sieve test program @pastebin, for hands-on experimentation. The complete Monty in a revised, templated version can be found in a newer sieve paste that works up to 2 ^ 64-1 .

+3


source to share


3 answers


I did a little bit of work parallelizing operations on Cell . It looks like this.

In this case, I would use a hash function that is fast and possibly incremental (like xxHash or MurmurHash3 ) and a hash list (which is a less flexible specialization of Merkle Tree ).

These hashes are extremely fast. It's surprisingly difficult to get better with some simple set of operations. The hash list provides parallelism - different blocks of the list can be processed by different threads, and then the hashes are hashed. You can also use Merkle Tree, but I suspect it would be just more complicated without much benefit.



  • Practically divide your range into aligned blocks - we'll call these microblocks. (for example, a microblock is a range such as [n <15, (n + 1) <15))
  • To process the microblock, calculate what you need to calculate, add to the buffer, hash buffer. (An incremental hash function will give a smaller buffer. The buffer does not have to fill with the same data length each time.)
  • Each hash block will be placed in a round buffer.
  • Divide the circular buffer into hashable blocks ("macroblocks"). Incrementally hash these macroblocks in the correct order when they become available or when there are no more microblocks left.
  • The resulting hash is the one you want.

Some additional notes:

  • I recommend a design where threads store a number of pending microblocks that have a circle buffer, process them, dump the values ​​into a circular buffer, and repeat.
  • This has the added benefit of deciding how many threads you want to use on the fly. for example, when requesting a new range of microblocks, each thread might find that too many / few threads are running and configuring.
  • I personally would have a thread adding the last hash token to a macroblock clearing that macroblock. Fewer options to tweak this way.
  • Maintaining a circular buffer is not as difficult as it sounds - the lowest order macroblock, still unprocessed, determines how much of the "macroblock space" is the circular buffer. All you need is a simple counter that increments as needed to express this.
  • Another advantage is that since threads regularly go through a standby / worker / standby / worker cycle, a thread that is unexpectedly slow will not hamper the runtime almost as badly.
  • If you want to do something less reliable, but lighter, you can give up a good job using a "striped" pattern - decide the maximum number of threads (N) and process each thread every Nth microblock (offset by its "identifier" stream) and hashes the resulting macroblocks per stream. Then, at the end, the hash of the macroblock hashes from N streams. If you have fewer N threads, you can divide the work by the number of threads required. (for example, 64 max threads, but three real threads, thread 0 processes 21 virtual threads, thread 1 processes 21 virtual threads, and thread 2 processes 22 virtual threads - not ideal, but not scary). It is essentially a shallow Merkel tree instead of a hash list.
+2


source


Kaganara's excellent answer demonstrates how to make things work even if the digests for adjacent blocks cannot be mathematically combined to give the same result as if the combined block were digested.

The only drawback of his solution is that the resulting block structure is by necessity rather rigid, but rather PKI with its official overarching hierarchy of certificates against PGP's "guerrilla style", whose web of trust spans only a few actors of interest. In other words, it requires the creation of a global address structure / hierarchy.

This is the digest in its current form; the change is that the order dependent part has been simplified to its basic minimum:

void add_prime (word_t n)
{
   count    += 1;
   sum      += n;
   product  *= n;
   checksum += n * count;
}

      

Here are the lessons learned from practical work with this digest:

  • count

    , sum

    and product

    (i.e. the partial size of the primary modular word) proved to be extremely useful due to the fact that they refer to things found in other parts of the world, such as certain lists in the OEIS
  • count

    and sum

    were very useful because the former tends to be naturally available when manipulating (generating, using, comparing) batches of primes, and the sum is easily computed on the fly with zero effort; this allows partial validation of existing results without leaving the whole product of creating and updating the digest, and without the overhead of two - relatively slow - multiplications
  • count

    is also extremely useful because it must be part of any indexing add-on built on digest systems by necessity, and conversely, it can direct the search directly to a block (range) containing an n-dimensional prime, or to blocks overlapping by the nth ( n + k) th number
  • the fourth component order dependency ( checksum

    ) turned out to be less of a hindrance than expected as small strokes tend to "occur" (be created or used) in order, in situations where validation might be required
  • the checksum order dependency - and the lack of combinability - made it completely useless outside of the particular block for which it was generated.
  • fixed-size helper structures β€” like the ubiquitous low-factor bitmaps β€” are best checked as raw memory for a startup self-test, rather than running a prime summary on them; this greatly reduces complexity and speeds up work by several orders of magnitude.


For many practical purposes, the order-specific checksum can simply be discarded, leaving you with a three-part digest that can trivially be combined for adjacent ranges.

For testing fixed ranges (eg in self-testing) the checksum component is still useful. Any other checksum - the moral equivalent of a CRC - would be just as useful for this, and probably faster. It would be even more helpful if an order-independent (combinable) way could be found to complement the resolution of the first three components. Expanding the resolution beyond the first three components is most important for large computational efforts such as sifting, checking, and digesting trillions of primes for posterity.

One such candidate for an order-independent, combinable fourth component is the sum of squares.

Overall, the digest has proven to be quite useful as it is, despite the shortcomings associated with the checksum component. The best way to look at a digest is probably with a "characteristic" part (the first three components combined) and a checksum part that is only applicable to a particular block. The latter could also replace the hash of any desired resolution. Kaganar's solution shows how this checksum / hash can be integrated into a system that goes beyond a single block, despite its inherent indistinguishability.

a summary of prime sources seems to have fallen by the wayside, so here it goes:

  • up to 1,000,000,000,000 available as files from sites such as primos.mat.br
  • up to 2 ^ 64-10 * 2 ^ 64 in super fast volume via primesieve.org console program (pipe)
  • up to 2 ^ 64-1 - and beyond - using the gp / PARI program (pipe, about 1 million primes per minute)
+1


source


I'll answer this question again in the second answer, because it is completely different and hopefully better:

It occurred to me that what you are doing is basically looking for a checksum, not a list of primes, but in a range of a bitfield where the number is prime (bit set to 1) or not (bit set to 0). You will have a lot more 0 than 1 for any interesting range, so you hopefully only have to do an operation for the 1st.

Generally, the problem with using a trivial hash in any order is that they do not handle multiplicity well and do not pay attention to order. But you don't care about any of these problems - each bit can only be set or unset once.

From this point of view, bitwise exclusion or addition should just be fine if combined with a good bit index hash function, i.e. found number. (If your primes are 64-bit, you can go with some functions here .)

So, for maximum simplicity, which will give you the same value for any set of input ranges, yes, stick with hashing and combining it with a simple operation like you. But jumping to a traditional hash function that seems "random" given its input - hash64shift

on a linked page is likely what you're looking for. The likelihood of a meaningful collision is remote. Most hash functions stink, however - make sure you pick one that is known to have good properties. (Avalanches are good, etc.). Thomas Wang usually isn't that bad. (Bob Jenkin is fantastic, but he sticks to mostly 32-bit features. While his blending feature on the linked page is very good, it's probably overkill.)

Parallelizing validation is obviously trivial, the size of the code and its effort is greatly reduced from my other answer, and there is much less synchronization and almost no buffering to happen.

0


source







All Articles