How do I parse XML with repetetive tags in Scala, xtract library?

793 Views Asked by At

I have the following problem: I'm using [xtract][1], a Scala library for parsing XMLs. The latest version, 2.0.0 I'm trying to parse an XML file like this

<annotation>
    <folder>home</folder>
    <filename>source</filename>
    <source>
        <database>Unknown</database>
    </source>
    <object>
        <name>name1</name>
        <truncated>0</truncated>
        <difficult>1</difficult>
    </object>
    <object>
        <name>name2</name>
        <truncated>1</truncated>
        <difficult>1</difficult>
    </object>
</annotation>

Here is my class I'm parsing an XML into:

case class MyObject(
    name: String,
    truncated: Boolean
)

    trait MyObjectXml{
      implicit val xmlReader: XmlReader[MyObject] = (
        (__ \ "object" \ "name").read[String],
        (__ \ "object" \ "truncated").read[Boolean]
      ).mapN(MyObject.apply)
    }
    
    object MyObjectXml extends MyObjectXmlXml

Here I do the parsing:

//open an XML file
val xml = scala.xml.XML.loadString(
  bufferedSource
  .getLines()
  .mkString("\n")
)
XmlReader.seq[MyObject].read(xml) match {
  case ParseSuccess(seatsXml) => ...
  case ParseFailure(errors) => ...
  case PartialParseSuccess(geometry, errors) => ...
}

I got the ParseFailure:

List(MultipleMatchesError(/object/name), MultipleMatchesError(/object/truncated))

[The error][2] indicates that the path matched multiple nodes, and only one was expected. So what I want is to get a Seq[MyClass]. How do I tell XMLReader that I want to parse all the repetetive tags into a seq? I thought applying .seq[T] would do it, but it doesn't. Help would be much appreciated!

Upd.: Sorry, indeed the xtract version in my project is 2.2.1, not 2.2.0 I just looked at README and expected to see there the latest version number [1]: https://github.com/lucidsoftware/xtract [2]: http://lucidsoftware.github.io/xtract/core/api/com/lucidchart/open/xtract/MultipleMatchesError.html

1

There are 1 best solutions below

0
On BEST ANSWER

There are several issues with your code. The path object/name is indeed ambiguous, as you cannot know whether you meant the first, or the second.

First, you need to define a class that contains the desired sequence, and its reader:

case class MyObjects(myObjects: Seq[MyObject])

object MyObjects {
  implicit val reader: XmlReader[MyObjects] = (__ \ "object").read(seq[MyObject]).map(apply)
}

Now, since this object reads the object part of the path, we need to omit that part from MyObject, meaning it should be:

case class MyObject(name: String, truncated: Boolean)

object MyObject {
  implicit val xmlReader: XmlReader[MyObject] = (
    (__ \ "name").read[String],
    (__ \ "truncated").read[Boolean]
    ).mapN(MyObject.apply)
}

Another issue is that you are parsing truncated as boolean, but the value is actually an int. So I changed the xml to contain booleans.

Finally, in order to read, we can do:

XmlReader.of[MyObjects].read(xml) match {
  case ParseSuccess(seatsXml) => println(seatsXml)
  case ParseFailure(errors) => println(errors)
  case PartialParseSuccess(geometry, errors) => println(errors)
}

And the output is:

MyObjects(Vector(MyObject(name1,false), MyObject(name2,true)))

Code run at Scastie.