I would like to implement a SPARQL construct query which constructs only triples that are not already in the graph.
Consider the following example graph:
@prefix crm: <http://www.cidoc-crm.org/cidoc-crm/> .
@prefix sdhss: <https://r11.eu/ns/prosopography/> .
@prefix : <http://www.example.org/> .
:0b5000b1e9 a sdhss:C23 .
:cebbd8cac9 a :E13_sdhss_P36 ;
crm:P140_assigned_attribute_to :0b5000b1e9 ;
crm:P141_assigned [ a crm:E21_Person ] ;
crm:P14_carried_out_by :d4f0bc5a29 ;
crm:P17_was_motivated_by :41cf6794ba .
:b427419ad6 a :E13_sdhss_P35 ;
crm:P140_assigned_attribute_to :0b5000b1e9 ;
crm:P141_assigned [ a sdhss:C24 ] .
:b427419ad7 a :E13_sdhss_P4 ;
crm:P140_assigned_attribute_to :0b5000b1e9 ;
crm:P141_assigned [ a crm:P52_Time-Span] .
In order to construct missing P14/P17 triples for E13_sdhss_P35/P4 instances, I came up with something like this:
prefix : <http://www.example.org/>
prefix crm: <http://www.cidoc-crm.org/cidoc-crm/>
prefix sdhss: <https://r11.eu/ns/prosopography/>
construct {
?e13_a2 crm:P14_carried_out_by ?agent ;
crm:P17_was_motivated_by ?source .
?e13_a3 crm:P14_carried_out_by ?agent ;
crm:P17_was_motivated_by ?source .
}
where {
?c23 a sdhss:C23 .
?e13_a1 a :E13_sdhss_P36 ;
crm:P140_assigned_attribute_to ?c23 ;
crm:P141_assigned [ a crm:E21_Person ] ;
crm:P14_carried_out_by ?agent ;
crm:P17_was_motivated_by ?source .
?e13_a2 a :E13_sdhss_P35 ;
crm:P140_assigned_attribute_to ?c23 ;
crm:P141_assigned [ a sdhss:C24 ] .
?e13_a3 a :E13_sdhss_P4 ;
crm:P140_assigned_attribute_to ?c23 ;
crm:P141_assigned [ a crm:P52_Time-Span ] .
minus {
{?e13_a2 crm:P14_carried_out_by ?agent .}
union
{?e13_a2 crm:P17_was_motivated_by ?source .}
union
{?e13_a3 crm:P14_carried_out_by ?agent .}
union
{?e13_a3 crm:P17_was_motivated_by ?source .}
}
}
The idea is to filter the result set against the pattern in the minus clause and have only the remaining bindings passed to the construct clause.
This works for the given example graph, yet as soon as I add P14/P17 assertion for E13_sdhss_P35 or E13_sdhss_P4, the construct query returns empty all together.
E.g. the query returns empty for the following modification with P14/P17 assertions along `E13_sdhss_P35:
@prefix crm: <http://www.cidoc-crm.org/cidoc-crm/> .
@prefix sdhss: <https://r11.eu/ns/prosopography/> .
@prefix : <http://www.example.org/> .
:0b5000b1e9 a sdhss:C23 .
:cebbd8cac9 a :E13_sdhss_P36 ;
crm:P140_assigned_attribute_to :0b5000b1e9 ;
crm:P141_assigned [ a crm:E21_Person ] ;
crm:P14_carried_out_by :d4f0bc5a29 ;
crm:P17_was_motivated_by :41cf6794ba .
:b427419ad6 a :E13_sdhss_P35 ;
crm:P140_assigned_attribute_to :0b5000b1e9 ;
crm:P141_assigned [ a sdhss:C24 ] ;
crm:P14_carried_out_by :d4f0bc5a29 ;
crm:P17_was_motivated_by :41cf6794ba .
:b427419ad7 a :E13_sdhss_P4 ;
crm:P140_assigned_attribute_to :0b5000b1e9 ;
crm:P141_assigned [ a crm:P52_Time-Span] .
I am testing this on a local Fuseki setup, on a live GraphDB instance a similar query seems to work just fine.
Possible solution:
The problem with earlier attempts apparently was that the SPARQL processor removes bindings from the result set once a minus/filter clause matches and the mechanics of the construct clause obviously are such, that triples only get generated if all variables are bound; so as soon as a single minus/filter clause caught something, ?e13_a2 got unbound, so no triples at all were generated. This also explains an observation I made earlier, namely that the order of minus/filter clauses was significant in previous attempts and e.g. the OP minus/union proposal.
The problem of losing the e13_a2 binding on minus/filter matches is remedied by specifically naming the subject variables (?e13_a2_p14, ?e13_a2_p17) before applying the filter constraint.