Return cities in Wikidata SPARQL Query, similar to a Wikipedia page

538 Views Asked by At

I'm not sure what I'm doing wrong. I have a nice list, but not only are the cities duplicating, but I'm unsure how they're defined as cities. I would expect to see London in the results and have similar results to this Wikipedia page. These results are quite different to the Wikipedia page.

I want to:

  1. Get a list of cities, with their first-level administrative country subdivision (province/state/region), similar to this Wikipedia page
  2. While avoiding duplicate cities.
SELECT ?city ?cityLabel ?country ?population ?countryLabel ?region ?regionLabel ?lat ?long
WHERE 
{
    ?city wdt:P31/wdt:P279 wd:Q515 .  # find instances of subclasses of city
    ?city (wdt:P131) ?region.
    ?region wdt:P31/wdt:P279 wd:Q10864048 .
    ?city wdt:P1082 ?population .
    ?city wdt:P17 ?country .  # Also find the country of the city
    ?city p:P625 ?statement . # coordinate-location statement
    ?statement psv:P625 ?coordinate_node .
    OPTIONAL { ?coordinate_node wikibase:geoLatitude ?lat. }
    OPTIONAL { ?coordinate_node wikibase:geoLongitude ?long.}

    FILTER (?population > 100000) .
    # choose language
    SERVICE wikibase:label {
        bd:serviceParam wikibase:language "en" .
    }
} 
LIMIT 8000

Try it

Update:

Although not an answer to this specific question, anyone trying to get similar data to this should have a look here.

Update 2:

With help in the comments from @UninformedUser, the query is now:

SELECT DISTINCT ?city ?cityLabel ?country ?population ?countryLabel ?region ?regionLabel ?lat ?long
WHERE 
{
    ?city wdt:P31/wdt:P279 wd:Q515 .  # find instances of subclasses of city
    ?city (wdt:P131) ?region.
    ?region wdt:P31/wdt:P279 wd:Q10864048 .
    ?city p:P1082 ?populationStmt .    
    ?populationStmt ps:P1082 ?population ; pq:P585 ?pop_date .  
    ?city wdt:P17 ?country .  # Also find the country of the city
    ?city p:P625 ?statement . # coordinate-location statement
    ?statement psv:P625 ?coordinate_node .
    OPTIONAL { ?coordinate_node wikibase:geoLatitude ?lat. }
    OPTIONAL { ?coordinate_node wikibase:geoLongitude ?long.}
      
    FILTER NOT EXISTS {     
      ?city p:P1082/pq:P585 ?pop_date_ .     
      FILTER (?pop_date_ > ?pop_date)     
    }

    FILTER (?population > 100000) .
    # choose language
    SERVICE wikibase:label {
        bd:serviceParam wikibase:language "en" .
    }
} 
LIMIT 8000

Try it

0

There are 0 best solutions below