I am trying to find a better solution to convert a plain text (but with predefined lengths for each field) to xml. For example the input text can be "Testuser new york 10018", the first 11 characters indicates user name, next 12 character indicates city and next 5 characters indicates zip code. So I need to form a xml from the above string with predefined field lengths.
I am thinking 2 approaches
Define a business entity and fill the entity properties by using substring functions on the input text and then serialize the entity to xml
Predefine the xml structure, use xslt to navigate to each node and fill the values by using substring functions on the input text.
The statements that: (XSLT)
"isn't suitable for transforming from structured text to XML. "
and the statement"XSLT
must
have XML as the input document"
**are both wrong.In fact, Approach 2 is quite easy to accomplish with XSLT:
I. XSLT 1.0:
when this transformation is applied on the specially formatted text (wrapped within a single top element to be made well-formed -- as we'll see in XSLT 2.0 such wrapping isn't necessary):
the wanted result is produced:
Notes:
This is just a demo that demonstrates how to accomplish the task. This is why I am not processing fixed-width fields (whil would be even easier), but space separated values.
Any space contained in any value is entered in the input as underscore (or any character of our choosing, that we know will never be part of any value. On output, any underscore is translated to a real space.
II. XSLT 2.0 solution:
when this transformation is applied on any XML document (not used and actually not needed, as in XSLT 2.0 it isn't necessary to have a source XML document), and if the file
C:\temp\delete\delete.txt
is:again the wanted, correct result is produced:
Notes:
Use of the standard XSLT 2.0 function
unparsed-text()
.Use of the standard XPath 2.0 function
tokenize()
.Final note:
Most complex text processing has been done in an industrial way entirely in XSLT. The FXSL library contains a generic LR(1) parser and a tweaked YACC that produces XML-formatted tables that are the input to this generic run-time LR(1) parser.
Using this tool I successfully built parsers for such complex languages as JSON and XPath 2.0.