Finding similar hashes

282 Views Asked by At

I'm trying to find 2 different plain text words that create very similar hashes.

I'm using the hashing method 'whirlpool', but I don't really need my question to be answered in the case or whirlpool, if you can using md5 or something easier that's ok.

The similarities i'm looking for is that they contain the same number of letters (doesnt matter how much they're jangled up)

i.e plaintext 'test' hash 1: abbb5 has 1 a , 3 b's , one 5 plaintext 'blahblah' hash 2: b5bab must have the same, but doesnt matter what order.

I'm sure I can read up on how they're created and break it down and reverse it, but I am just wondering if what I'm talking about occurs.

I'm wondering because I haven't found a match of what I'm explaining (I created a PoC to run threw random words / letters till it recreated a similar match), but then again It would take forever doing it the way i was dong it. and was wondering if anyone with real knowledge of hashes / encryption would help me out.

1

There are 1 best solutions below

2
Maarten Bodewes On

So you can do it like this:

  1. create an empty sorted map \
  2. create a 64 bit counter (you don't need more than 2^63 inputs, in all probability, since you would be dead before they would be calculated - unless quantum crypto really takes off)
  3. use the counter as input, probably easiest to encode it in 8 bytes;
  4. use this as input for your hash function;
  5. encode output of hash in hex (use ASCII bytes, for speed);
  6. sort hex on number / alphabetically (same thing really)
  7. check if sorted hex result is a key in the map
    • if it is, show hex result, the old counter from the map & the current counter (and stop)
    • if it isn't, put the sorted hex result in the map, with the counter as value
  8. increase counter, goto 3

That's all folks. Results for SHA-1:

011122344667788899999aaaabbbcccddeeeefff for both 320324 and 429678

I don't know why you want to do this for hex, the hashes will be so large that they won't look too much alike. If your alphabet is smaller, your code will run (even) quicker. If you use whole output bytes (i.e. 00 to FF instead of 0 to F) instead of hex, it will take much more time - a quick (non-optimized) test on my machine shows it doesn't finish in minutes and then runs out of memory.