I finally found a way to get images for a bunch of keywords where the resulting images don't necessarily have to contain every single keyword as long as I get some images, but I am not sure I took the best options of API parameters to get the most results, but still relevant.
For example, for the keywords "apples, granny, smith" I would like a lot of images with green shiny apples and possible other apples
My relevant parameters are as follows (using https://commons.wikimedia.org/w/api.php):
&query &generator=search &gsrsearch=File:[keywords] &prop=pageimages
But there are so many other options possible and testing all their combinations would mean 1000 tests more or less. So I hope someone has more in-depth knowledge.
Example of other options which can be combined in many ways:
- [keywords] with pipe or spaces: "Apples granny smith" or "Apples|granny|smith"
or other possibility combining keywords? - Different generators or query:
- &generator=images &redirects=1 &titles=[keywords] (is titles the only option here?)
- &action=opensearch &search=[keywords]
- &generator=search
- .. ? ..
- Different props (1)
- &prop=pageimages
- &prop=images
- &prop=imageinfo
- .. ? ..
- Different kind of searches
- &(gsr)search=File:[keywords]
- &(gsr)search=[keywords]&(gsr)namespace=6
- $titles=[keywords] (2)
- $titles=File:[keywords] (2)
I know NOT ALL combinations make sense, but still too much to test
(1) Also if somehow other props besides images make sense, I can get the image via the algorithm, no problem
(2) Also 4.3 and 4.4 is not really a search but maybe it could be with wildcards or regex? I didn't understand that when I encountered it somewhere on the api docs or web search.
Preferably I would not like to have pdfs or thumbs from pdf as a results, but if that's not possible with the best combination here, I can filter it in php.