Matching specific Geonames IDs with Wikidata IDs using Pywikibot

98 Views Asked by At

I have an extensive list of Geonames IDs for which I want to find the matching Wikidata IDs. I would like to use Pywikibot and, if possible, iterate over the list.

The SPARQL query for an individual Geonames ID would be:

SELECT DISTINCT ?item ?itemLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "de". }
  {
    SELECT DISTINCT ?item WHERE {
      ?item p:P1566 ?statement0.
      ?statement0 (ps:P1566) "2867714".
    }
  }
}

2867714 is the Geonames ID for Munich, and running the query via the following script returns the correct Wikidata ID:

import pywikibot
from pywikibot import pagegenerators as pg

# read query file

with open('C:\\Users\\p70076654\\Downloads\\SPARQL_mapGeonamesID.rq', 'r') as query_file:
    QUERY = query_file.read()
    #print(QUERY)
    
# create generator based on query
# returns an iterator that produces a sequence of values when iterated over
# useful when creating large sequences of values

wikidata_site = pywikibot.Site("wikidata", "wikidata")
generator = pg.WikidataSPARQLPageGenerator(QUERY, site=wikidata_site)

print(generator)

# OUTPUT: <generator object WikidataSPARQLPageGenerator.<locals>.<genexpr> at 0x00000169FAF3FD10>

# iterate over generator

for item in generator:
    print(item)

The correct output returned is: wikidata:Q32664319

Ideally, I want to replace the specific ID for a variable to add IDs from my list successively. I checked the Pywikibot documentation but could not find information on my specific use case. How can I ingest replace the individual ID for a variable and iterate over my ID list?

1

There are 1 best solutions below

4
logi-kal On BEST ANSWER

First, why do you use a subquery? You can simplify its syntax as:

SELECT DISTINCT ?item ?itemLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "de". }
  ?item p:P1566/ps:P1566 "2867714".
}

Coming to your question, you can use python's string interpolation for generalizing your query as:

SELECT DISTINCT ?item ?itemLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "de". }
  ?item p:P1566/ps:P1566 "%s".
}

and then instantiate it as QUERY % "2867714".

With a list of ids, it would be something like:

with open('C:\\Users\\p70076654\\Downloads\\SPARQL_mapGeonamesID.rq', 'r') as query_file:
    QUERY = query_file.read()

geonames_ids = ["2867714", "2867715", "2867716"]
for geonames_id in geonames_ids :
    wikidata_site = pywikibot.Site("wikidata", "wikidata")
    generator = pg.WikidataSPARQLPageGenerator(QUERY % geonames_id, site=wikidata_site)
    ...