JS - How to check if 2 images (their hash) are similar

4.3k Views Asked by At

GOAL
Finding a good way to check if 2 image are similar compairing their hash profiles. The hash is a simple array containing 0 and 1 values.

INTRO
I have 2 images. They are the same image but with some little differences: one has a different brightness, rotation and shot.
What I want to do is create a Javascript method to compare the 2 images and calculate a percentage value that tells how much they are similar.

WHAT I'VE DONE
After uploading the 2 images into a html5 canvas to get their image data, I've used the pHash algorithm (www.phash.org) to obtain their hash rapresentation.
The hash is an array containing 0 and 1 values that recreates the image in a "simplified" form.
I've also created a JS script that generates a html table with black cells where the array contains 1.
The result is the following screenshot (the image is a Van Gogh picture):

Screenshot

Now, what I should do is to compare the 2 arrays for obtaining a percentage value to know "how much" they are similar.
The most part of the hash Javascript algorithms I've found googling already have a compare algorithm: the hamming distance algorithm. It's very simple and fast, but not very precise. In fact, the hamming distance algorithm says that the 2 images in my screenshot have a 67% of similarity.

THE QUESTION
Starting with 2 simple arrays, with the same length, filled with 0 and 1 values: what could be a good algorithm to determine similarity more precisely?

NOTES
- Pure Javascript development, no third party plugins or framework.
- No need of a complex algorithm to find the right similarity when the 2 images are the same but they are very different (strong rotation, totaly different colors, etc.).

Thanx

PHASH CODE

  // Size is the image size (for example 128px)
  var pixels = [];

  for (var i=0;i<imgData.data.length;i+=4){
   
      var j = (i==0) ? 0 : i/4;
   var y = Math.floor(j/size);
   var x = j-(y*size);   
   
   var pixelPos = x + (y*size);
   var r = imgData.data[i];
   var g = imgData.data[i+1];
   var b = imgData.data[i+2];

   var gs = Math.floor((r*0.299)+(g*0.587)+(b*0.114));
   pixels[pixelPos] = gs;
      
  }

  var avg = Math.floor( array_sum(pixels) / pixels.length );
  var hash = [];
  array.forEach(pixels, function(px,i){
    if(px > avg){
      hash[i] = 1;
    } else{
      hash[i] = 0;
    }
  });

  return hash;

HAMMING DISTANCE CODE

  // hash1 and hash2 are the arrays of the "coded" images.
  
  var similarity = hash1.length;
  
  array.forEach(hash1, function(val,key){
    if(hash1[key] != hash2[key]){
      similarity--;
    }
  });

  var percentage = (similarity/hash1.length*100).toFixed(2);

NOTE: array.forEach is not pure javascript. Consider it as a replace of: for (var i = 0; i < array.length; i++).

2

There are 2 best solutions below

2
On

I'm using blockhash, it seems pretty good so far, only false positives I get are when half the pictures are of the same background color, which is to be expected =/

http://blockhash.io/

BlockHash may be slower than yours but it should be more accurate.

What you do is just calculate the greyscale of EACH pixels, and just compare it to the average to create your hash.

What BlockHash does is split the picture in small rectangles of equal size and averages the sum of the RGB values of the pixels inside them and compares them to 4 horizontal medians.

So it is normal that it takes longer, but it is still pretty efficient and accurate.

I'm doing it with pictures of a good resolution, at minimum 1000x800, and use 16bits. This gives a 64 character long hexadecimal hash. When using the hamming distance provided by the same library, I see good results when using a 10 similarity threshold.

Your idea of using greyscale isn't bad at all. But you should average out portions of the image instead of comparing each pixels. That way you can compare a thumbnail version to its original, and get pretty much the same phash!

2
On

I don't know if this can do the trick, but you can just compare the 0 and 1 similarities between arrays :

const arr1 = [1,1,1,1,1,1,1,1,1,1],
      arr2 = [0,0,0,0,0,0,0,0,0,0],
      arr3 = [0,1,0,1,0,1,0,1,0,1],
      arr4 = [1,1,1,0,1,1,1,0,1,1]

const howSimilar = (a1,a2) => {
    let similarity = 0
    a1.forEach( (elem,index) => {
        if(a2[index]==elem) similarity++
    })
    let percentage = parseInt(similarity/arr1.length*100) + "%"
    console.log(percentage)
}

howSimilar(arr1,arr2) // 0%
howSimilar(arr1,arr3) // 50%
howSimilar(arr1,arr4) // 80%