Very Weird Parsing of unusual xml

112 Views Asked by At

I have a problem with using XMLMappable. This likely is pilot error, but I haven't found it yet after 3 days. I've gone through the #xmlmapper questions/answers but haven't found anything that deals with this. This question How to access grand child element? looked promising, but I don't think it is the same issue. Please correct me if I'm wrong.

I have written an XMLParser using XMLMappable. I built it in small increments and all went well until the last parser (Src). The parser doesn't use an xsd, but the predefined xsd looks like:

<xs:complexType name="srcCT" mixed="true">
    <xs:choice maxOccurs="unbounded" minOccurs="0">
        <xs:element name="w" type="wCT"/>
    </xs:choice>
</xs:complexType>

This means that if a src tag exists it can have 0 or more innerText alternating with a w tag, like:

    <src> text <w> wtext </w> more text <w> another w tag </w>...</src>

The problem is that parser results are all over the place with what combinations work and those that don't.

So, using the example parser I created my test and test data and am adding them here. Please excuse the ugly print lines:

import Foundation
import XMLMapper


class TestParser : XMLMappable
{
    var nodeName: String!

    var entries: [Entry]?

    required init ( map: XMLMap )
    {
    }

    func mapping ( map: XMLMap )
    {
        entries <- map ["entry"]
    }
}





class Entry: XMLMappable
{
    var nodeName: String!

    var id : String?
    var lang : String?
    var word : W?
    var source : Src?

    var teststring : String?

    required init ( map: XMLMap )
    {
    }

    func mapping ( map: XMLMap )
    {
        var raw : String?
        raw  <- map.attributes [ "id" ]
        guard raw != nil else { return }

        teststring <- map ["testString"]
        if teststring != nil
        {
            print ( "teststring = " + teststring! )
        }

        lang = String ( raw?.prefix ( 1 ) ?? "q" )
        id = String ( (raw?.dropFirst ( 1 ))!)
        print ( "************************** \n entry id = " + raw! )

        word <- map ["w"]
        source <- map ["src"]

        print ( "word = "  + (word?.word)! )
    }
}


class W: XMLMappable
{
    var nodeName: String!
    var word : String?
    var lang : String?
    var src : String?

    required init ( map: XMLMap )
    {
    }

    func mapping ( map: XMLMap )
    {
        lang <- map ["_xml:lang"]
        src <- map [ "_src"]
        word <- map.innerText
    }
}


//  The P R O B L E M  Child
class Src: XMLMappable
{
    var nodeName: String!
    var srctext : String?
    var references : [W]? = [W] ()


    required init ( map: XMLMap )
    {
    }

    func mapping ( map: XMLMap )
    {
        srctext <- map.innerText
        if srctext == nil
        {
            srctext = "???"
        }
        var word : W?
        word <- map ["w"]
        guard word != nil else { return }
        references?.append ( word! )

        print ( "source.w.reference = " + word!.word! )
        print ( "source .srctext = " + (srctext!) )

    }
}

========== The test data:

<?xml version="1.0" encoding="utf-8"?>
    <lexicon >
    <entry id="Q1a">
        <testString>Test string Q1</testString>
        <w xml:lang="eng">q1</w>
        <src>src parser never called for this entry</src>
    </entry>
    <entry id="Q2">
        <w xml:lang="eng">q2</w>
        <src>this doesn't (map.innerText returns nil and i change to ???) <w src="Q2a">This works (2a)</w>; never reached </src>
    </entry>
    <entry id="Q3">
        <w xml:lang="eng">q3</w>
        <src>map.innerText returns nil <w src="3">This does not work (3)</w>; never reached <w src="Q3a">never reached</w></src>
    </entry>
    <entry id="Q4">
        <w xml:lang="eng">q4</w>
        <src>map.innerText returns nil <w src="q4a">This Works: 4a</w>;</src>
    </entry>
    <entry id="Q5">
        <w xml:lang="eng">q5</w>
        <src>This works <w src="Q5a">and so does this: 5a</w></src>
    </entry>
</lexicon>

==============

and the output:

teststring = Test string Q1
************************** 
entry id = Q1a
word = q1
************************** 
entry id = Q2
source.w.reference = This works (2a)
source .srctext = return nil
word = q2
************************** 
entry id = Q3
word = q3
************************** 
entry id = Q4
source.w.reference = This Works: 4a
source .srctext = return nil
word = q4
************************** 
entry id = Q5
source.w.reference = and so does this: 5a
source .srctext = This works
word = q5

There are two general issues: 1) why the parser sometimes picks up elements and other times doesn't. 2) how to correctly pick up multiple inner Text and tags.

Thank you for your assistance with this. I really hope there is a solution.

Joseph

1

There are 1 best solutions below

1
gcharita On

You can refer to this issue in XMLMapper repository

Because src element, some times, have more than one portions of text (innerText), you have to map it like an Array<String> (the same applies to the w element inside src)

So, you can try replacing your Src class with this:

class Src: XMLMappable {
    var nodeName: String!

    var srctext: [String]?
    var references: [W]?

    required init(map: XMLMap) {}

    func mapping(map: XMLMap) {
        srctext <- map.innerText
        references <- map["w"]
    }
}

Even then, the mapped values might not be so easy to read.

For example, mapping the following element with the above model class:

<src>
    map.innerText returns nil 
    <w src="3">This does not work (3)</w>
    ; never reached 
    <w src="Q3a">never reached</w>
</src>

You end up having something like this:

// assuming that `source` is the variable in which you mapped the above `src` element
let source: Src = entry.source 

// the printed values are in comments 
print(source.srctext[0]) // map.innerText returns nil
print(source.srctext[1]) // ; never reached 
print(references.references[0].word) // This does not work (3)
print(references.references[1].word) // never reached

Hope this helps