Sorry for my bad english, i have a small database that contains hashes of photos, when I try to find similar photos to the one below:
for which the following hash was calculated: "0f3f2764ecc482c2" using the method average_hash()
imagehash.average_hash(Image.open(imagePath))
The system finds a very large number of collisions, below is an example of photos that were identified as completely identical:
The table in which I store hashes of photos:
CREATE TABLE IF NOT EXISTS photos(id_photo BIGINT PRIMARY KEY, photo_hash TEXT, FOREIGN KEY (id_photo) REFERENCES users (id));
Adding hash:
cur.execute('''INSERT INTO photos(id_photo, photo_hash) VALUES(%s, %s)''', id, str(result_image_recognition['photo_hash'])])
SQL Query by which I calculate the Euclidean distance between my hash and stored hashes:
SELECT id_photo, BIT_COUNT(photo_hash ^ '0f3f2764ecc482c2') FROM photos;
The photos table contains 7889 photos, 959 of them are mistakenly determined by this query as completely identical (Euclidean distance is 0). About a week I can not solve this problem please someone help me.
You need to convert the hex strings to integers before doing xor operations
because all strings the first character of which is not 1-9 are converted to 0.