Mediawiki API: Get most relevant images for bunch of keywords (and filter out pdfs/pdf thumbs)

72 Views Asked by At

I finally found a way to get images for a bunch of keywords where the resulting images don't necessarily have to contain every single keyword as long as I get some images, but I am not sure I took the best options of API parameters to get the most results, but still relevant.
For example, for the keywords "apples, granny, smith" I would like a lot of images with green shiny apples and possible other apples

My relevant parameters are as follows (using https://commons.wikimedia.org/w/api.php):
&query &generator=search &gsrsearch=File:[keywords] &prop=pageimages

But there are so many other options possible and testing all their combinations would mean 1000 tests more or less. So I hope someone has more in-depth knowledge.

Example of other options which can be combined in many ways:

  1. [keywords] with pipe or spaces: "Apples granny smith" or "Apples|granny|smith"
    or other possibility combining keywords?
  2. Different generators or query:
    1. &generator=images &redirects=1 &titles=[keywords] (is titles the only option here?)
    2. &action=opensearch &search=[keywords]
    3. &generator=search
    4. .. ? ..
  3. Different props (1)
    1. &prop=pageimages
    2. &prop=images
    3. &prop=imageinfo
    4. .. ? ..
  4. Different kind of searches
    1. &(gsr)search=File:[keywords]
    2. &(gsr)search=[keywords]&(gsr)namespace=6
    3. $titles=[keywords] (2)
    4. $titles=File:[keywords] (2)

I know NOT ALL combinations make sense, but still too much to test
(1) Also if somehow other props besides images make sense, I can get the image via the algorithm, no problem
(2) Also 4.3 and 4.4 is not really a search but maybe it could be with wildcards or regex? I didn't understand that when I encountered it somewhere on the api docs or web search.

Preferably I would not like to have pdfs or thumbs from pdf as a results, but if that's not possible with the best combination here, I can filter it in php.

0

There are 0 best solutions below