Mediawiki API: Get most relevant images for bunch of keywords (and filter out pdfs/pdf thumbs)

72 Views Asked by e-motiv At 14 May 2023 at 16:39

I finally found a way to get images for a bunch of keywords where the resulting images don't necessarily have to contain every single keyword as long as I get some images, but I am not sure I took the best options of API parameters to get the most results, but still relevant.
For example, for the keywords "apples, granny, smith" I would like a lot of images with green shiny apples and possible other apples

My relevant parameters are as follows (using https://commons.wikimedia.org/w/api.php):
&query &generator=search &gsrsearch=File:[keywords] &prop=pageimages

But there are so many other options possible and testing all their combinations would mean 1000 tests more or less. So I hope someone has more in-depth knowledge.

Example of other options which can be combined in many ways:

[keywords] with pipe or spaces: "Apples granny smith" or "Apples|granny|smith"
or other possibility combining keywords?
Different generators or query:
1. &generator=images &redirects=1 &titles=[keywords] (is titles the only option here?)
2. &action=opensearch &search=[keywords]
3. &generator=search
4. .. ? ..
Different props (1)
1. &prop=pageimages
2. &prop=images
3. &prop=imageinfo
4. .. ? ..
Different kind of searches
1. &(gsr)search=File:[keywords]
2. &(gsr)search=[keywords]&(gsr)namespace=6
3. $titles=[keywords] (2)
4. $titles=File:[keywords] (2)

I know NOT ALL combinations make sense, but still too much to test
(1) Also if somehow other props besides images make sense, I can get the image via the algorithm, no problem
(2) Also 4.3 and 4.4 is not really a search but maybe it could be with wildcards or regex? I didn't understand that when I encountered it somewhere on the api docs or web search.

Preferably I would not like to have pdfs or thumbs from pdf as a results, but if that's not possible with the best combination here, I can filter it in php.

Original Q&A

Mediawiki API: Get most relevant images for bunch of keywords (and filter out pdfs/pdf thumbs)

There are 0 best solutions below

Related Questions in MEDIAWIKI

Related Questions in MEDIAWIKI-API

Trending Questions

Popular # Hahtags

Popular Questions