In Scala, how can I put an incrementing ID in XML element using transformer / Rewrite rule

278 Views Asked by At

I want to read in a XML file and put an incrementing id in specific elements. Here is some test code I wrote to figure out how to do that:

import scala.xml._
import scala.xml.transform._

val testXML =
 <document>
    <authors>
      <author>
        <first-name>Firstname</first-name>
        <last-name>Lastname</last-name>
      </author>
    </authors>
 </document>


def addIDs(node : Node) : Node = {

    object addIDs extends RewriteRule {
      var authorID = -1
      var emailID = -1
      var instID = -1

      override def transform(elem: Node): Seq[Node] =
      {
        elem match {

          case Elem(prefix, "author", attribs, scope, _*) =>
            //println("element author: " + elem.text)
            if ((elem \ "@id").isEmpty) {
              println("element id is empty:" + elem\"@id")
              authorID += 1
              println("authorID is " + authorID)
              elem.asInstanceOf[Elem] % Attribute(None, "id", Text(authorID.toString), Null)
            } else {
              elem
            }


        case Elem(prefix, "email", attribs, scope, _*) =>
          println("EMAIL")
          elem.asInstanceOf[Elem] % Attribute(None, "id", Text(authorID.toString), Null)

        case Elem(prefix, "institution", attribs, scope, _*) =>
          println("INST")
          elem.asInstanceOf[Elem] % Attribute(None, "id", Text(instID.toString), Null)

        case other =>
          other
      }
    }
  }
  object transform extends RuleTransformer(addIDs)
  transform(node)
}


val newXML = addIDs(testXML)

This code is functional - but, the ids don't come out as expected:

element id is empty:
authorID is 0
element id is empty:
authorID is 1
element id is empty:
authorID is 2
element id is empty:
authorID is 3
element id is empty:
authorID is 4
element id is empty:
authorID is 5
element id is empty:
authorID is 6
element id is empty:
authorID is 7
newXML:scala.xml.Node=<document>
    <authors>
        <author id="7">
           <first-name>Firstname</first-name>
           <last-name>Lastname</last-name>
        </author>
    </authors>
  </document>

it looks like the transformer hits each node multiple times, incrementing the id and then finally stops when the id is up to 7. Why is it touching the node so many times before finally finishing with it? Is there something I could be doing differently to tell it to finish with that node?

I thought maybe it was traversing over the newly modified node, hence my check for the element containing an attribute named 'id'. But that doesn't seem to work. Maybe it's a bad idea to do this in the first place?

Thanks for any help with this.

1

There are 1 best solutions below

0
On

Looks like I hit this scala bug: https://issues.scala-lang.org/browse/SI-3689 - BasicTransformer has exponential complexity

My workaround was to do this: https://stackoverflow.com/a/1089519/3935595