Jena Rule Engine with TDB

1.4k Views Asked by At

I am having my data loaded in TDB model and have written some rule using Jena in order to apply into TDB. Then I am storing the inferred data into a new TDB.

I applied the case above in a small dataset ~200kb and worded just fine. HOWEVER, my actual TDB is 2.7G and the computer has been running for about a week and it is in fact still running.

Is that something normal, or am I doing something wrong? What is the alternative of the Jena rule engine to use?

Here is a small piece of the code:

public class Ruleset {
  private List<Rule> rules = null; 
  private GenericRuleReasoner reasoner = null;

  public Ruleset (String rulesSource){
    this.rules = Rule.rulesFromURL(rulesSource);
    this.reasoner = new GenericRuleReasoner(rules);
    reasoner.setOWLTranslation(true);           
    reasoner.setTransitiveClosureCaching(true);
  }

  public InfModel applyto(Model mode){
    return ModelFactory.createInfModel(reasoner, mode);
  }

  public static void main(String[] args) {
    System.out.println(" ... Running the Rule Engine ...");
    String rulepath = "src/schemaRules.osr";
    Ruleset rule = new Ruleset (rulepath);
    InfModel infedModel = rule.applyto(data.tdb);
    infdata.close();
  }
}
2

There are 2 best solutions below

0
On

Thanks Ian.

I was actually able to do it via SPARQL update as DAVE advise me to and it took only 10 minutes to finish the job.

Here is an example of the code:

System.out.println(" ... Load rules ...");
    data.startQuery();
    String query = data.loadQuery("src/sparqlUpdatesRules.tql");
    data.endQuery();

    System.out.println(" ... Inserting rules ...");
    UpdateAction.parseExecute(query, inferredData.tdb);

    System.out.println(" ... Printing RDF ...");
    inferredData.exportRDF();

    System.out.println(" ... closeing  ...");
    inferredData.close();

and here is an example of the SPARQL update:

INSERT {
   ?w ddids:carries ?p .
} WHERE {
   ?p ddids:is_in ?w .
}; 

thanks for your answers

0
On

A large dataset in a persistent store is not a good match with Jena's rule system. The basic problem is that the RETE engine will make many small queries into the graph during rule propagation. The overhead in making these queries to any persistent store, including TDB, tends to make the execution times unacceptably long, as you have found.

Depending on your goals for employing inference, you may have some alternatives:

  • Load your data into a large enough memory graph, then save the inference closure (the base graph plus the entailments) to a TDB store in a single transaction. Thereafter, you can query the store without incurring the overhead of the rules system. Updates, obviously, can be an issue with this approach.

  • Have your data in TDB, as now, but load a subset dynamically into a memory model to use live with inference. Makes updates easier (as long as you update both the memory copy and the persistent store), but requires you to partition your data.

If you only want some basic inferences, such as closure of the rdfs:subClassOf hierarchy, you can use the infer command line tool to generate an inference closure which you can load into TDB:

$ infer -h
infer --rdfs=vocab FILE ...
General
  -v   --verbose         Verbose
  -q   --quiet           Run with minimal output
  --debug                Output information for debugging
  --help
  --version              Version information

Infer can be more efficient, because it doesn't require a large memory model. However, it is restricted in the inferences that it will compute.

If none of these work for you, you may want to consider commercial inference engines such as OWLIM or Stardog.