Why does gzip use CRC instead of generic hashing algorithm?

What is the difference between a digital signature and a codeword ?

CONTEXT: I recently had to do a decent job with gzipped files. One interesting thing that I found while reading the Python zlib documentation is the statement that CRC should not be used as a general hashing algorithm. This got me thinking, what's the point of CRC if it's not a generic hashing algorithm? Shouldn't you test for equality?

+3


source to share


2 answers


CRC in zip files is mainly used to make sure the file is not damaged during storage or transit. It is not used to provide authenticity or to protect against files that are being modified by an attacker. Therefore, cryptographic security is not required.

Cryptographic hashes provide the same or better protection against integrity failures. However, they are more computationally intensive and require more performance. If the hash output is reduced too much, the CRC can actually provide better detection of (random) changes.

Since the CRC value does not protect against deliberate changes - it is not difficult to find files that will generate the same CRC value - it is not suitable for digital signatures. For this, you need a cryptographic hash.



Please note that a cryptographic hash is not a signature. For signatures, you need a digital signature application such as PGP. Digital signatures (usually) consist of a hash, which is then processed using information from the private key (verified by the public key at the receiver).


Note: Sometimes the word "signature" means "fingerprint". For fingerprints, a cryptographic hash - mostly MD5 or SHA-1 is still in use. But this is a rather distant and, in my opinion, incorrect use of the word "signature".

+6


source


You asked four or five different questions here and used a whole bunch of different, sometimes ambiguous, terms in context. Better to ask one clear question at a time on a Q&A site.

  • Why does gzip use CRC instead of generic hashing algorithm?

CRC is well suited for error detection and is relatively fast to compute. The input bits are well distributed in the CRC, and it also has good packet error detection capability. Therefore. It is not clear what you mean by general hash algorithm (only for hash tables? Or cryptographically strong?). In any case, the goal is not to create a file lookup table, but to sign the data. There would be no value in generating a cryptographic hash anyway, for example. MD5 or SHA-2, in the gzip file, as someone might just change the data and hash! A hash that anyone can generate is useful if you are getting the hash over another channel.

  1. What is the difference between a digital signature and a code word?

By "code word", I suppose you mean CRC from the link. CRC is a quickly computed error checking code implemented in hardware and software applications to check data integrity. Out of context, I think you mean a cryptographic hash when you say digital signature. A cryptographic hash is a one-way function that is designed for extremely complex message creation with a given hash value. CRC is the exact opposite. Being a linear function, it is quite easy to change the message to have a given CRC... The cryptographic hash usually has many more bits than the CRC so that the chance of an accidental collision is virtually impossible. However, the one-way and many bits make the cryptographic hash much more computationally intensive to generate.

  1. One interesting thing that I discovered while reading the Python zlib documentation is the statement that CRC should not be used as a general hashing algorithm.


It is right. While CRC allocates input bits to a value very well and can be used successfully as a hash, it cannot perform certain tests that you might like the hashing algorithm . If you want an algorithm to create a hash table from keys, use the hash algorithm designed for this purpose . If all this is to do this, the hash is not needed and should not be cryptographic. Faster is more important.

  1. what's the point of the CRC if it's not a generic hashing algorithm?

CRC is one of the results of coding theory that provides algorithms for error detection and correction. The purpose of CRC is to detect errors.

  1. Shouldn't you test for equality?

It is not clear what equality you mean here. In any case, the CRC point must check integrity. It provides redundant information on a stream that detects almost any unintended damage to that stream in transit.

+5


source







All Articles