Elasticsearch Aggregation with hamming distance of a phash

657 Views Asked by David Kaplan At 23 November 2018 at 13:36

Trying to group together similar documents with matching keyword field values and phashes of their related images. At the moment I have the following which works well for exact matching phashes

          'duplicate_docs':
        A('terms',
          script={
              "lang":
              "painless",
              "inline":
              "def term = doc['make'] + '' +doc['model'] + doc['province'] + doc['mileage'];return term+''+doc['image_hash'];"
          }),
    }, {'dup_docs': A('top_hits', size=20)}):

However some of the images are slightly different and the whole point of phash is that you can use a hamming distance to figure how different

I realise this probably makes the calculation vastly more expensive as essentially need to compare every image against every other image which seems excessive but unsure how else I could go about this. Thanks

Original Q&A

There are 1 best solutions below

dragon.warrior.nyc On 26 December 2019 at 03:38

You may want to try this out:

Mu, C, Zhao, J., Yang, G., Yang, B. and Yan, Z., 2019, October. Fast and Exact Nearest Neighbor Search in Hamming Space on Full-Text Search Engines. In International Conference on Similarity Search and Applications (pp. 49-56). Springer, Cham.

The FENSHSES method proposed by the above paper could efficiently find all r-neighbors in Hamming space w/o scanning all documents.

Elasticsearch Aggregation with hamming distance of a phash

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in ELASTICSEARCH

Related Questions in HAMMING-DISTANCE

Related Questions in PHASH

Related Questions in ELASTICSEARCH-DSL-PY

Trending Questions

Popular # Hahtags

Popular Questions