Wikidata Virtuoso SPARQL Endpoint - How to get more than 100,000 results

315 Views Asked by At

I need to get Wikidata artifacts (instance-types, redirects and disambiguations) for a project.

As the original Wikidata endpoint has time constraints when it comes to querying, I have come across Virtuoso Wikidata endpoint.

The problem I have is that if I try to get for example the redirects with this query, it only returns 100,000 results at most:

PREFIX owl: http://www.w3.org/2002/07/owl#
CONSTRUCT {?resource owl:sameAs ?resource2}
WHERE
{
?resource owl:sameAs ?resource2
}

I’m writing to ask if you know of any way to get more than 100,000 results. I would like to be able to achieve the maximum number of possible results.

Once the results are obtained, I must have 3 files (or as few files as possible) in the Ntriples format: wikidata_intance_types.nt, wikidata_redirecions.nt and wikidata_disambiguations.nt.

Thank you very much in advance.

All the best,

Jose Manuel

2

There are 2 best solutions below

0
On

Please recognize that in both cases (Wikidata itself, and the Virtuoso instance provided by OpenLink Software, my employer), you are querying against a shared resource, and various limits should be expected.

You should space your queries out over time, and consider smaller chunks than the 100,000 limit you've run into -- perhaps 50,000 at a time, waiting for each query to finish retrieving results, plus another second or ten, before issuing the next query.

Most of the guidance in this article about working with the DBpedia public SPARQL endpoint is relevant for any public SPARQL endpoint, especially those powered by Virtuoso. Specific settings on other endpoints will vary, but if you try to be friendly — by limiting the rate of your queries; limiting the size of partial result sets when using ORDER BY, LIMIT, and OFFSET to step through to get a full result set for a query that overflows the instance's maximum result set size; and the like — you'll be far more successful.

0
On

You can get and host your own copy of wikidata as explained in https://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData

There are also alternatives to get a partial dump of wikidata e.g. with https://github.com/bennofs/wdumper

Or ask for access to one of the non public copies we run by sending me a personal e-mail via my RWTH Aachen i5 account