I need to detect duplicates of images. This is what I did:
try {
MessageDigest messageDigest = MessageDigest.getInstance("SHA-512");
FileInputStream fi = new FileInputStream(file);
byte fileData[] = new byte[(int) file.length()];
fi.read(fileData);
fi.close();
return new BigInteger(1, messageDigest.digest(fileData)).toString(16);
}
catch (Exception e) {
throw new RuntimeException("cannot read file " + file.getAbsolutePath(), e);
}
The reason it doesn't work for all images is because of course files need to be exactly the same. For instance, it's practically the same, but has a different size (195 KB against 196 KB):
https://i.stack.imgur.com/rzXq9.jpg https://i.stack.imgur.com/bRY2r.jpg
What bothers me, however, is the fact that I got two different hashes for an image that has exactly same size, same colour profile, same resolution, etc. (I'm not sure if I can post the image since it's a person's face and I haven't asked for their consent.)