Is there a better Regex for parsing DTD

902 Views Asked by At

I've got the DTD for OFX 1.03 (their latest version despite having developed and released 1.60, but I digress...)

I would like to use regex to have groups that split an entity, element, other tags into its parts for further processing such that I would take a tag like this:

<!ENTITY % ACCTTOMACRO "(BANKACCTTO | CCACCTTO | INVACCTTO)">

And create an object like this

new EntityTag { string Name = "%ACCTTOMACRO"; string[] ChildTypes = new string[] {"BANKACCTTO", "CCACCTTO", "INVACCTTO"}};

I've got a regular expression that looks like this:

Regex re = new Regex(@"<!(\b)+([\s\S])?[^>]+>");  

Admittedly, I'm new to regex, so I've done good so far getting this which gives me a match collection over the DTD for each tag without comments.

I would like to leverage grouping to facilitate creation of the previously mentioned object.

If I'm on the totally wrong path, please instruct me, however if you do download this document, I think you may find its not standard. (Visual studio throws up some red flags with the way this document is formatted)

I don't expect anyone to go to the trouble, but for the curious here is the link to download the specs.

1

There are 1 best solutions below

3
On BEST ANSWER

It looks like they've got schema available as well. Why not download the schema instead and parse that with an XML parser (for instance, LINQ-to-XML)?