I have an RDF file that is pulling in data from a file using graph.LoadFromFile(), it has successfully been parsed for years in another language, but using dotnetrdf in C# it's throwing the error "The value '4259-306N4220DP6' for rdf:ID is not valid, RDF IDs can only be valid NCNames as defined by the W3C XML Namespaces specification." Is there a way to bypass this specific rdf id and log it, or a namespace I can manually include to allow it, or pretty much any other workaround?
I've removed the RDF:id in question and it continued on but removing it while in production is not an option. I've added an underscore to the front and it continued processing.
The message from dotNetRDF is correct, the exhibited
rdf:IDis syntactically invalid. One solution would be to rewrite attributes of the formrdf:ID="x"tordf:about="#x"since the latter does not have the same restrictions as the former (but see my Closing Remarks, below).Unfortunately, at time of writing there is no public API within dotNetRDF 3.1 that will allow us to correct or discard erroneous elements on the fly. The parser throws an exception that terminates processing at the point of the first error. That leaves us with no choice but to correct the XML prior to feeding it to dotNetRDF.
Ideally, the upstream program would be changed. But since that has been ruled out in this case, we will have to take matters into our own hands.
The code that follows is a bare-bones C# scripting example that shows a way to perform the rewrites using DOM manipulation. A streaming solution might be preferred, but the code is long enough as it is :)
Result:
Closing Remarks
There is a reason why
rdf:IDhas this restriction. The assumption is that the ID is a fragment identifier to an element within an XML document, with similar restrictions. Some other content types also have those restrictions so best-practice advice is to conform, at least in hash-style vocabularies. Slash-style vocabularies do not have the same issue (but have other implications).Of course, if the IRI is never derefenced in this way then none of this matters.