I'm trying to use the imagehash library (https://pypi.org/project/ImageHash/) to identify visually identical files. I'm testing with 3 files. The second is just a reduced resolution of the first. File 3 is very different. Images below.
I wrote a simple python program to diff the images using imagehash:
from PIL import Image
import imagehash
import os
import sys
def gethash(relPath):
script_dir = os.path.dirname(__file__) #<-- absolute dir the script is in
path = os.path.join(script_dir, relPath)
return imagehash.phash(Image.open(path))
print(gethash(sys.argv[1]) - gethash(sys.argv[2]))
When I run it from the commandline, image 1 and 2 have the same difference than 1 to 3. What am I doing wrong with imagehash?
PS C:\pickle\lambda\hash> py .\testih.py .\img\1.jpg .\img\2.jpg
36
PS C:\pickle\lambda\hash> py .\testih.py .\img\1.jpg .\img\3.jpg
36
PS C:\pickle\lambda\hash> py .\testih.py .\img\2.jpg .\img\3.jpg
30
I have tried phash, average_hash, dhash, all with similar results. Thank you for any advice!
1.jpg https://picklepics.app/misc/1.jpg