Mozilla defines the itemid property as:
The itemid global attribute provides microdata in the form of a unique, global identifier of an item.
Is that identifier meant to be unique among other web pages of the same website, among entire World Wide Web or among just about the entire world ?
If so, what is the difference with the identifier property?
Additional context:
For clarification, I will explain the research I did and what I found unclear.
In its definition and background notes, the identifier property mentions some caveats. I think it means it should not be used when the specific subtype used defines a more precise identifier, but I did not understand what it really said.
Sometimes, a Thing seems to be identified by the itemid property defined as a URL:
<meta itemscope itemprop="mainEntityOfPage" itemType="https://schema.org/WebPage" itemid="https://google.com/article"/>
In that case, it also served as a url, which makes things even more confusing because Schema.org suggests the following code to link to a page describing a Thing:
<div itemscope itemtype="https://schema.org/Person">
<a href="alice.html" itemprop="url">Alice Jones</a>
</div>
Apparently, url never serves as an identifier. Does that mean that, whenever relevant, I should prefer a combination of mainEntityOfPage and itemid instead of url, and that url should only be used for links related to the Thing, but never for the Thing’s main page?
While Microdata is not an RDF serialization, it’s close to one (see also: Microdata to RDF), and explaining this in RDF terms (using the Turtle serialization for examples) might be easier. You can convert Microdata snippets to Turtle with, for example, Gregg Kellogg’s RDF Distiller.
With
itemidAn RDF triple consists of a subject, a predicate, and an object. For example:
Or with prefixed names:
In Microdata, this triple could be encoded like this:
What this means: The thing with the IRI
http://dbpedia.org/resource/The_Lord_of_the_Ringsis a book.Without
itemidWithout
itemid, you would have a blank node as subject:What this means: something is a book / a book exists.
About the subject IRI
The IRI has to be universally unique.
The IRI doesn’t have to be a HTTP/HTTPS IRI. Even if it is a HTTP/HTTPS IRI, it doesn’t have to be resolvable on the Web.
The IRI has to represent the actual thing, not merely a document about that thing (unless you want to say something about that very document, of course).
For example, the first IRI represents the intellectual creation of Tolkien, while the second and third IRIs represent documents about his intellectual creation:
Saying that
https://en.wikipedia.org/wiki/The_Lord_of_the_Ringsis aschema:Bookwould be semantically wrong.If you reuse existing IRIs, you have to make sure to use them according to their definition.
If you mint your own IRIs (under your own domain), you have to make sure to "reserve" them, so that their meaning (= what they represent) doesn’t change.
Schema.org’s
identifierandurlpropertiesWhile the
identifierproperty could also hold the subject IRI as value, it can of course hold any other kind of identifier as well, and not all of them are IRIs.For example, if you sell a product in your webshop, you might want to provide its product ID as a string. And as the IRI
https://example.com/products/555#thisrepresents the actual product (instead of the product page), you might want to provide the URL to the product page:Now, if you do all this without providing a subject IRI, you get something close in meaning, but with the drawback that it’s harder for data consumers to integrate your data with their data, and with the drawback that you and others can’t easily link to that thing (which is arguably one of the core features of the Semantic Web):
If it’s possible for you to provide a subject IRI, there is no good reason not to do it.