Many articles on the internet (like this one) suggest using xml:lang
or some custom attribute to encode meta-information about language inside XML tags. They mention that these codes have to comply with BCP47 standard.
Let's see what would happen if I encode language attribute as articles suggest:
- Inside DTD:
<!ATTLIST text xml:lang NMTOKEN #IMPLIED>
- Inside XML:
<text xml:lang="YODU991Yklew-e-ijsw02ijwk">...</text>
What is the expected result?
DTD validator would check if YODU991Yklew-e-ijsw02ijwk
code is a real BCP47 language code, if country and script exist and mark it red, if those codes that are incorrect. Exactly the same way as http://schneegans.de/ helps validating these codes (WRONG code vs. CORRECT code).
What happens instead?
Validator percieves this attribute only as some text and does not validate, if it as a real language code or some gibberish.