Hash collision example (printable strings)

I searched a lot for md5 hash collisions but I only found binary examples. I would like to find two UTF8 strings that have the same md5 hash. Is there anything, or does collision only work for binary data?


source to share

4 answers

This is definitely possible:

  • We all agree that there are collisions for MD5 due to - we display infinitely many possible inputs for elements belonging to a finite sequence.
  • There is a strong possibility that there are an infinite number of collisions: we can create infinite input pairs, and MD5 tries to map them evenly.

That being said, some of these collisions are required valid UTF8 strings, but they are extremely rare since most of them would be just random binary garbage.

If you require to find such posts, I recommend using the collision search written by Patrick Stach , which should return a couple of random posts within a few hours, or my attempt to improve it . The latter uses the techniques presented in later work by Wang (the first person to demonstrate examples of MD5 collisions), Lian, Sasaki, Yajima, and Klim.

I think you could also use the attack extension length to some extent, but that requires a deeper understanding of what's going on inside MD5.



There are UTF-8 collisions there . By the nature of the cryptographic hashes, finding them is deliberately difficult, even for a hash like a broken MD5.

You can find MD5 Rainbow tables that can be used to crack passwords and thus UTF-8 strings. As @alk pointed out, looking for brute force will take a very long time.



I recently found a very simple hash collision case in my project. I am using xxhash Python wrapper for hashing. Link: https://github.com/ewencp/pyhashxx

s1 = 'mdsAnalysisResult105588'
s2 = 'mdsAlertCompleteResult360224'
pyhashxx.hashxx(s1) # Out: 2535747266
pyhashxx.hashxx(s2) # Out: 2535747266


This caused a very tricky caching issue on the system and I finally discovered it was a hash collision.



Canonical example of MD5 hash collision ( hex - here ):

Message 1:

d131dd02c5e6eec4693d9a0698aff95c 2fcab58712467eab4004583eb8fb7f89
55ad340609f4b30283e488832571415a 085125e8f7cdc99fd91dbdf280373c5b
d8823e3156348f5bae6dacd436c919c6 dd53e2b487da03fd02396306d248cda0
e99f33420f577ee8ce54b67080a80d1e c69821bcb6a8839396f9652b6ff72a70


Message 2

d131dd02c5e6eec4693d9a0698aff95c 2fcab50712467eab4004583eb8fb7f89
55ad340609f4b30283e4888325f1415a 085125e8f7cdc99fd91dbd7280373c5b
d8823e3156348f5bae6dacd436c919c6 dd53e23487da03fd02396306d248cda0
e99f33420f577ee8ce54b67080280d1e c69821bcb6a8839396f965ab6ff72a70


indeed are valid UTF-8 strings. They contain no bytes NULL

and therefore are UTF-8 strings. Now they are pointless and look like garbage when decoded:

Message 1:

1i=\/ʵF~@X>U4 䈃%qAZQ%ɟ7<[؂>1V4[m6Sⴇ9cH͠3BW~Tp


(some characters were control characters)

Message 2:

1i=\/ʵF~@X>U4    䈃%AZQ%ɟr7<[؂>1V4[m6S49cH͠3BW~Tp(


(the same situation)

Oh, and before I forget, here's the MD5 hash:





All Articles