Loading 220 milion triples in Anzograph

202 Views Asked by Richard Nagelmaeker At 28 June 2025 at 10:47

I've got a dataset with 220 milion triples, in one TTL file. Is there a way I can upload this data into AnzoGraph?

In the AnzoGraph documentation, https://docs.cambridgesemantics.com/anzograph/userdoc/load-reqs.htm, I came across the text below:

AnzoGraph supports a maximum URI length of 16K characters. There is also a limit of 64K on the number of unique URIs you can load into AnzoGraph. That is, the number of unique URIs, including graph URIs and predicate URIs, that you can load into AnzoGraph must be less than 64K. If you exceed this limit, the Load operation exceeding the limit will fail and AnzoGraph returns the message "m_lowest_unused_index <= a_max_value()".

With 64K of unique triples, I'm expecting the upload of 220 milion triples to fail. Especially since it's a linking dataset, linking multiple sources, so lot's of unique URI's.

Is there a way around this limitation?

Original Q&A

There are 1 best solutions below

Sean Martin On 04 September 2020 at 19:20

220 milion triples, in one TTL file.

This approach will load your TTL data very slowly because you will be engaging just a single CPU core to ingest the data. If you can load the data just once into e.g. <yourgraph>, then use the command

`COPY <yourgraph> TO <dir:/mydir/myfiles.ttl.gz>`

which will split your dataset into many gzip compressed TTL files and next time load the data MPP style from that data directory instead, using every single C{U core in your AnzoGraph server/cluster to load sub-sets of the data in parallel. I should also note that 220m triples is actually a very small data set for AnzoGraph. I have loaded over 100m on my T470s Thinkpad while just fiddling around and single server-class systems will easily handle into the billions, while a large cluster has been tested to over a trillion with a record-breaking LUBM some years ago. Typical production use cases are in the 10's of billions.

Disclaimer: I work for Cambridge Semantics.

Loading 220 milion triples in Anzograph

There are 1 best solutions below

Related Questions in UPLOAD

Related Questions in DATASET

Related Questions in RDF

Related Questions in GRAPH-DATABASES

Related Questions in ANZOGRAPH

Trending Questions

Popular # Hahtags

Popular Questions