Fastest method to fingerprint an array (calculating a unique hash from an array of data)

I use a lot of caching and buffering of API calls in my WWW Framework and one of the things I end up using all over is fingerprint data to match the names of the cache files, as well as detect API calls that have already done.

Lots of data is moving around in arrays like GET, POST, etc. As a result, the uniqueness of the API call depends on the data.

As a result, I need to print this information. This requires generating a "fingerprint" from an array of data and hashing it into a string that I can store and compare against.

PHP uses serialize () and json_encode () to serialize an array. After various tests, I find json_encode () to be a faster array serialization method and am quite happy with it.

There are md5 () and sha1 () functions for hashing, of which md5 () is faster in meeting my criteria.

So my current fingerprint algorithm is:

$fingerprint=md5(json_encode($array));

      

But I am wondering if this is the fastest method to fingerprint an array in PHP. I've tried google and StackOverflow but haven't found any good alternatives. Am I on the right track or do I need to do something different?

+3


source to share


2 answers


Once you have an array json_encoded

, you should probably go with a noncirp hash function if you're primarily concerned with speed. Different hash functions are good for different things. MD5 and Sha1 are referred to as cryptographic because they are difficult to reverse (note that they are widely considered deprecated for security reasons due to vulnerabilities). CRC (Cyclic Redundancy Check) functions are error detection codes and will not match uniqueness well anyway.

Wikipedia is a decent place to do this, if only because the contributions there usually have external links to the implementation of the libraries: List of hash functions.I would recommend reading some of the non-cryptographic libraries and comparing them. Non-cryptographic functions are more written for speed and a reasonable degree of uniqueness, sacrificing security, error detection, and other interesting properties that are exactly what you want from your description.

One final point to keep in mind if you're mainly concerned about speed is how you are going to store and compare the fingerprints themselves. MD5 outputs 128 bits of data that won't fit into a numeric type in php without additional library calls and overhead. With my money, I would bet you could get a better comparison speed and the memory would come from a hash function that could output 64-bit numbers directly. Please note: to get 64 numbers natively in php, you need to have 64-bit hardware and configure php / install in 64-bit mode. I have some code around here, somewhere I used to test our staging and staging environments, which I could possibly dig out if you're interested.



Btw, I dont think you will get faster array parsing than json-encode. The heart of this problem is massive traverse and string manipulation, so essentially speed is proportional to the verbosity of the output. JSON-encode is very accurate compared to php serialization or export functions. I bet if you've looked at enough comments on the php documentation pages, you might find someone who wrote a hash function that takes an array as input directly, but that would be a game, no matter if it was good at all.

Feel free to ask questions if I don't understand anything.

+4


source


I think @Patrick M's description is really good, I just want to add more details here.

Like @Patrick M , md5 is a cryptographic hash, which will make some extra effort to add security (even if not currently recommended for this purpose).

You can check what hash functions you have with hash_algos .

However, here you can see the criterion of these algorithms where it says (I haven't tried it) that the faster md4

, then md5 , etc.



To be fair, a benchmark must also use a specific feature, if any. ( md5 , sha1 , crc32 , ...) as it might be faster (or slower, who knows).

In conclusion, I think your approach looks pretty good.

Just keep in mind that md5 is not secure as described in the PHP Password FAQ , so if you need to store credit card fingerprints or something like that (which I don't think is your case) , you may need a different function as well as adding some extra steps.

Alternatively, you can check spl_object_hash .

+1


source







All Articles