possible to recreate a file using metadata, SHA, and a few bytes?

195 Views Asked by At

I may be a fool, but I hope that it would be possible to recreate an exact copy of a file if we only have:

  • it's exact lenght and type
  • a few of it's begining bytes( suppose 100)
  • it's md5 and sha256 signatures.

I do not know if this possible, but so far I haven't been able to find online anything ressembling what I'm describing.

Sidenote: I know such process would be computationally expensive, so if possible I intend to try it with small files(1-3mb)

2

There are 2 best solutions below

2
On BEST ANSWER

Unfortunately, this is not practical, and given your example of "small files (1-3mb)", you're way off in scale of what current computers will be able to do in terms of performance.

A simple Wolfram Alpha query to calculate how long

  • 128 bytes of data file (256^128 possible combinations of byte sequences)
  • when you can calculate a billion billion billion (10^9 * 10^9 * 10^9 = 10^27) digests every second
  • ... in minutes, then hours, then days, then years

would take, and you end up with....

10463951242053391806136963369726580181263718864311851635192874886429209483641954321222640418122029864527291727710479949464718215680589004332016189037791576956967351342601788071700268169006221818240189631008834448226154239518944108944497601509840881752510934060240763835605888507473266002770708660224/1835636794567108154296875 years

or shortened to ...

5.7004475357125694689539104223396268823502567825415606695024 × 10^273

aka "something something" with 273 zeroes at the end ... years.

Even taking in luck and average number of attempts before hitting the right number and you're to half of that, which is still... unfortunately several orders of magnitude larger than the expected heat death of our universe.

You will need a targetted attack on the digest algorithm you chose, exploiting known weaknesses. SHA-1 has been "broken" in terms of having targetted collisions being made, but most of the other SHA-x implementations are still safe, relatively speaking.

Now, I picked "billion billion billion" as a rather generous estimate of how many digests you can computer in a second. I have no fricking idea if this is accurate or not but the result with 273 zeroes should tell you that even if I was off by an order of magnitude or ten, you're still far from there.

According to this answer from 2012, current computers (back then) could compute 100 million hashes per second of SHA512. If we apply Moore's Law to this and just double the performance every 18 months which I believe was the norm, we could potentially calculate 100 million * 2^(6) [=64] hashes today, which is roughly 6.4 billion hashes per second. We're still a billion billion hashes away in speed magnitude to reach our target (which still has 273 zeroes behind it).

0
On

Even with all the magic computing power you can imagine, what you are asking for is impossible. If you take a 1 MB file and compute its MD5 and SHA256, you are left with 384 bits. I will ignore the first 100 bytes of the file, which you say you know. You are still left with over eight million bits. Subtracting the 384, you still have over eight million bits.

That means that there are approximately 28x106, (~102,400,000) different 1 MB files that all map to exactly the same MD5 and SHA256 signatures. Even you could compute all of them, there is no way to determine which of them was the original file.