Very Weird Parsing of unusual xml

97 Views Asked by At

I have a problem with using XMLMappable. This likely is pilot error, but I haven't found it yet after 3 days. I've gone through the #xmlmapper questions/answers but haven't found anything that deals with this. This question How to access grand child element? looked promising, but I don't think it is the same issue. Please correct me if I'm wrong.

I have written an XMLParser using XMLMappable. I built it in small increments and all went well until the last parser (Src). The parser doesn't use an xsd, but the predefined xsd looks like:

<xs:complexType name="srcCT" mixed="true">
    <xs:choice maxOccurs="unbounded" minOccurs="0">
        <xs:element name="w" type="wCT"/>
    </xs:choice>
</xs:complexType>

This means that if a src tag exists it can have 0 or more innerText alternating with a w tag, like:

    <src> text <w> wtext </w> more text <w> another w tag </w>...</src>

The problem is that parser results are all over the place with what combinations work and those that don't.

So, using the example parser I created my test and test data and am adding them here. Please excuse the ugly print lines:

import Foundation
import XMLMapper


class TestParser : XMLMappable
{
    var nodeName: String!

    var entries: [Entry]?

    required init ( map: XMLMap )
    {
    }

    func mapping ( map: XMLMap )
    {
        entries <- map ["entry"]
    }
}





class Entry: XMLMappable
{
    var nodeName: String!

    var id : String?
    var lang : String?
    var word : W?
    var source : Src?

    var teststring : String?

    required init ( map: XMLMap )
    {
    }

    func mapping ( map: XMLMap )
    {
        var raw : String?
        raw  <- map.attributes [ "id" ]
        guard raw != nil else { return }

        teststring <- map ["testString"]
        if teststring != nil
        {
            print ( "teststring = " + teststring! )
        }

        lang = String ( raw?.prefix ( 1 ) ?? "q" )
        id = String ( (raw?.dropFirst ( 1 ))!)
        print ( "************************** \n entry id = " + raw! )

        word <- map ["w"]
        source <- map ["src"]

        print ( "word = "  + (word?.word)! )
    }
}


class W: XMLMappable
{
    var nodeName: String!
    var word : String?
    var lang : String?
    var src : String?

    required init ( map: XMLMap )
    {
    }

    func mapping ( map: XMLMap )
    {
        lang <- map ["_xml:lang"]
        src <- map [ "_src"]
        word <- map.innerText
    }
}


//  The P R O B L E M  Child
class Src: XMLMappable
{
    var nodeName: String!
    var srctext : String?
    var references : [W]? = [W] ()


    required init ( map: XMLMap )
    {
    }

    func mapping ( map: XMLMap )
    {
        srctext <- map.innerText
        if srctext == nil
        {
            srctext = "???"
        }
        var word : W?
        word <- map ["w"]
        guard word != nil else { return }
        references?.append ( word! )

        print ( "source.w.reference = " + word!.word! )
        print ( "source .srctext = " + (srctext!) )

    }
}

========== The test data:

<?xml version="1.0" encoding="utf-8"?>
    <lexicon >
    <entry id="Q1a">
        <testString>Test string Q1</testString>
        <w xml:lang="eng">q1</w>
        <src>src parser never called for this entry</src>
    </entry>
    <entry id="Q2">
        <w xml:lang="eng">q2</w>
        <src>this doesn't (map.innerText returns nil and i change to ???) <w src="Q2a">This works (2a)</w>; never reached </src>
    </entry>
    <entry id="Q3">
        <w xml:lang="eng">q3</w>
        <src>map.innerText returns nil <w src="3">This does not work (3)</w>; never reached <w src="Q3a">never reached</w></src>
    </entry>
    <entry id="Q4">
        <w xml:lang="eng">q4</w>
        <src>map.innerText returns nil <w src="q4a">This Works: 4a</w>;</src>
    </entry>
    <entry id="Q5">
        <w xml:lang="eng">q5</w>
        <src>This works <w src="Q5a">and so does this: 5a</w></src>
    </entry>
</lexicon>

==============

and the output:

teststring = Test string Q1
************************** 
entry id = Q1a
word = q1
************************** 
entry id = Q2
source.w.reference = This works (2a)
source .srctext = return nil
word = q2
************************** 
entry id = Q3
word = q3
************************** 
entry id = Q4
source.w.reference = This Works: 4a
source .srctext = return nil
word = q4
************************** 
entry id = Q5
source.w.reference = and so does this: 5a
source .srctext = This works
word = q5

There are two general issues: 1) why the parser sometimes picks up elements and other times doesn't. 2) how to correctly pick up multiple inner Text and tags.

Thank you for your assistance with this. I really hope there is a solution.

Joseph

1

There are 1 best solutions below

1
On

You can refer to this issue in XMLMapper repository

Because src element, some times, have more than one portions of text (innerText), you have to map it like an Array<String> (the same applies to the w element inside src)

So, you can try replacing your Src class with this:

class Src: XMLMappable {
    var nodeName: String!

    var srctext: [String]?
    var references: [W]?

    required init(map: XMLMap) {}

    func mapping(map: XMLMap) {
        srctext <- map.innerText
        references <- map["w"]
    }
}

Even then, the mapped values might not be so easy to read.

For example, mapping the following element with the above model class:

<src>
    map.innerText returns nil 
    <w src="3">This does not work (3)</w>
    ; never reached 
    <w src="Q3a">never reached</w>
</src>

You end up having something like this:

// assuming that `source` is the variable in which you mapped the above `src` element
let source: Src = entry.source 

// the printed values are in comments 
print(source.srctext[0]) // map.innerText returns nil
print(source.srctext[1]) // ; never reached 
print(references.references[0].word) // This does not work (3)
print(references.references[1].word) // never reached

Hope this helps