Hash for duplicates images doesn't work for all duplicates

90 Views Asked by At

I need to detect duplicates of images. This is what I did:

try {
    MessageDigest messageDigest = MessageDigest.getInstance("SHA-512");
    FileInputStream fi = new FileInputStream(file);
    byte fileData[] = new byte[(int) file.length()];
    fi.read(fileData);
    fi.close();
    return new BigInteger(1, messageDigest.digest(fileData)).toString(16);
}
catch (Exception e) {
    throw new RuntimeException("cannot read file " + file.getAbsolutePath(), e);
}

The reason it doesn't work for all images is because of course files need to be exactly the same. For instance, it's practically the same, but has a different size (195 KB against 196 KB):

https://i.stack.imgur.com/rzXq9.jpg https://i.stack.imgur.com/bRY2r.jpg

What bothers me, however, is the fact that I got two different hashes for an image that has exactly same size, same colour profile, same resolution, etc. (I'm not sure if I can post the image since it's a person's face and I haven't asked for their consent.)

0

There are 0 best solutions below