Use of phash algorithm in code mentioned in other question?

164 Views Asked by At

Slow performance in hashing images in Python

Hello, my question refers to the above mentioned question.

I'm trying to compare pictures using the perceptual hash - and I'm very interested to boost the performance. I just began to experiment with the phash algorithm.

In the above mentioned question in code the "dhash" is used. But in text the "phash" is mentioned.

Is it possible to adapt the idea of boosting performance used in above mentioned question just to the phash algorithm?

Is it possible to use the code for perceptual hashing and "change the word "dhash" to "phash""?

Thanks a lot for your help!!

Jörg

1

There are 1 best solutions below

0
Ecuador On

Phash is normally a slower algorithm than dhash, but there is a very fast open source phash implementation: Image::PHash. The only issue is that it is in Perl. Well, the important part is in C, so it could easily be ported to a python module, but somebody has to do it. If you don't mind it being Perl, you can use it directly, e.g. to process the results of the randomly generated images from the post you linked to :

parallel magick -size 640x480 xc: +noise random {}.jpg ::: {1..1000}

You would do something like:

use Image::PHash;

for (1..1000) {
    my $iph = Image::PHash->new("$_.jpg", "Imlib2");
    my $p = $iph->pHash();
    print "$_ $p\n";
}

This is using the fastest of the image libraries supported by the module (requires Imlib2 and Image::Imlib2 installed). As you'd do with python, you can easily parallelize it, so the below takes just 1.3s on my M1 Mac Mini for those 1000 images (vs 5.8s on a single thread):

use Image::PHash;
use MCE::Loop;

MCE::Loop::init {
    max_workers => 8,
    chunk_size  => 'auto'
};

mce_loop {
    my ($mce, $chunk_ref, $chunk_id) = @_;
    for (@{$chunk_ref}) {
        my $iph = Image::PHash->new("$_.jpg", "Imlib2");
        my $p = $iph->pHash();
        print "$_ $p\n";
    }
} (1..1000);

So, instead of being slower than dhash, it's quite a bit faster.

Sorry for not being more helpful with python.