Recently I got some corrupted JPEG images after a mistakingly input command:
~$> sed -i 's/;/_/g' *
After that, in the working directory and the subdirectories, Every byte '0x3b' in JPEG images became '0x5f'. Viewer apps displays the images corrupted, such as below: corrupted image sample
I could not identify which byte should be recovered, and when I tried to validate the warning/error flags from the images with toolkits such as EXIFtool, they just returns OK as the corrupted JPEG is not literally BROKEN not to be opened by a viewer.
Images should be repaired, since there is no duplicated image backup for them, but I don't know how to start. Just replacing 0x5f with 0x3b again is not effective, since the number of cases would be too big (2^n I guess where there are n candidate 0x5f) to take the trial-and-error replacing way. I've just started parsing huffman table in a JPEG image header and hoping to identify the conflict point between huffman coded statement and binary, but not sure.
How can I recover the images in this situation? I appreciate your help.
There appear to be 57 incidences of
0x5fin your corrupted image. If you can't find a better way, you could maybe "eyeball" the effects of replacing the incorrect bytes in the image fairly quickly like this:open the image in binary mode and read it all with
JPEG = open('PdQpR.jpg','rb').read()use
offsets = [m.start() for m in re.finditer(b'_', JPEG)]to find the byte offsets of the 57 occurrencesdisplay the image with
cv2.imdecode()andcv2.imshow()and then enter a loop accepting keypresses withcv2.waitkey()p = move to previous one of 57 occurrences
n = move to next one of 57 occurrences
SPACE = toggle between
0x5fand0x3bs = save current state
q = quit
I had a quick attempt at this but haven't had much success using it yet:
Note: Toggling some bytes between
'_'and';'results in illegal images and error messages fromcv2.imdecode()and/orcv2.imshow(). Ideally you would wrap these inside atry/exceptand back out the last change if they occur. I didn't do that, yet.Note: I didn't implement
savefunction, it is just something likeopen('corrected.jpg', 'wb').write(JPEG)