I'm trying to write a RelaxNG schema that has the following rules:
- A
line
element can contain zero or morea
andb
elements. - Every
a
element must have a correspondingb
element and vice-versa. a
elements must always precede their matchingb
elements.
So, the following should all be considered valid:
<line><a/><b/></line>
<line><a/><a/><b/><b/></line>
<line><a/><a/><b/><a/><b/><b/></line>
Meanwhile, the following are all invalid:
<line><b/><a/></line>
<line><a/><a/><b/></line>
<line><a/></line>
<line><b/></line>
How can this be expressed in RelaxNG? My first thought was to create a recursive reference, like so:
element line { pair* }+
pair = a, pair?, b
a = element a { empty }
b = element b { empty }
However, Jing considers this a "bad recursive reference to 'pair'". I can't for the life of me figure out how to solve this! Any ideas?
The patterns of Relax NG (like the content models of DTDs and XSD) are essentially regular expressions; they define regular languages over the set of element names and
text
. The a/b matching you appear to be looking for requires a context-free grammar; it cannot, therefore, be defined in Relax NG.It can, of course, be approximated. If you expect that in practice you will never have more than five a/b pairs (or, more precisely: never more than five
a
elements for which you have not yet seen the matchingb
), you can define either of the following languages.First:
This defines the subset of your language which strictly enforces your two rules but cannot handle a/b pairs nesting more than 10 deep. (Or 11, depending on what you're counting.) Every document accepted by this definition will be a member of the language you actually want, but not every member of the language will be accepted.
If rejecting valid instances of the language is not acceptable, you can define patterns as above, but redefine pair0 as:
This defines a superset of the language, in which the rules are enforced for a/b pairs up to the maximum level of nesting, but in which the rules are abandoned once the nesting level exceeds that maximum. In this case, every member of the desired language will be accepted, but so will some garbage that shouldn't be.
Whether these approximations are any use in your application I leave to you to decide.
If approximations are not acceptable, you may find it easier to get what you need by defining the XML differently.
One simple change would be to wrap each matching
a
andb
in an element (which I'll calle
):Now the validity of the document provides a very simple guarantee that the
a
elements andb
elements pair up as required and that eacha
precedes its matchingb
.Given that the
a
andb
are empty, that eacha
immediately follows a start-tag fore
, and that eachb
immediately precedes the end-tag for the samee
element, you could replace thea
andb
elements entirely with the start-tags fore
and end-tags fore
, respectively, and declareline
asSAX interfaces, DOM interfaces, XSLT, XQuery, and every other method I know of handling XML makes it roughly as simple to associate actions with the start and end of an element as it is to associate actions with an empty element.
Your valid examples then become:
and your invalid examples become non-wellformed data.