How to delete root nodes of XML strings in R

167 Views Asked by At

I want to combine multiple XML strings (> 1000) into one string in R. This can for example be done by the XML package (xml_add_sibling). However I would like to get rid of the intermediate root nodes ("positions" in my example).

Input:

library(XML)    
position1 <- <positions>
  <moneyMarket>
    <positionName>1</positionName>
    <notional>10000</notional>
    <currency>EUR</currency>
  </moneyMarket>
</positions>

position2 <- <positions>
      <moneyMarket>
        <positionName>2</positionName>
        <notional>40000</notional>
        <currency>EUR</currency>
      </moneyMarket>
        </positions>

position3 <- <positions>
      <moneyMarket>
        <positionName>3</positionName>
        <notional>50000</notional>
        <currency>EUR</currency>
      </moneyMarket>
    </positions>

Code:

combined_XML <- xml_add_sibling(position1,position2)
combined_XML <- xml_add_sibling(combined_XML,position3)

Actual results:

<positions>
  <moneyMarket>
    <positionName>1</positionName>
    <notional>10000</notional>
    <currency>EUR</currency>
  </moneyMarket>
</positions>
<positions>
  <moneyMarket>
    <positionName>2</positionName>
    <notional>40000</notional>
    <currency>EUR</currency>
  </moneyMarket>
</positions>
<positions>
  <moneyMarket>
    <positionName>3</positionName>
    <notional>50000</notional>
    <currency>EUR</currency>
  </moneyMarket>
</positions>

Expected results:

<positions>
  <moneyMarket>
    <positionName>1</positionName>
    <notional>10000</notional>
    <currency>EUR</currency>
  </moneyMarket>
  <moneyMarket>
    <positionName>2</positionName>
    <notional>40000</notional>
    <currency>EUR</currency>
  </moneyMarket>
  <moneyMarket>
    <positionName>3</positionName>
    <notional>50000</notional>
    <currency>EUR</currency>
  </moneyMarket>
</positions>
1

There are 1 best solutions below

2
maydin On

I took the example data which is including three xml document with name position1 , position2 and position3. Since each one has a name called position, I used get function to reach them. I assigned i<-3, since there exist three xml document.

If you have got 1000 xml file, then you need to assign i<-1000. So it means that you have got 1000 xml file named with both position and number like ; position1, position2, position3, position4, ..., position1000.

The codes below, adds the children of the other xml documents to the first one which is position1. Thus, at the end, by running xmlParse(position1) you can reach the result.

  library(xml2)  
  library(XML)

  position1 <- "<positions>
                  <moneyMarket>
                    <positionName>1</positionName>
                    <notional>10000</notional>
                    <currency>EUR</currency>
                  </moneyMarket>
                </positions>"

  position2 <- "<positions>
                  <moneyMarket>
                    <positionName>2</positionName>
                    <notional>40000</notional>
                    <currency>EUR</currency>
                  </moneyMarket>
                </positions>"

  position3 <- "<positions>
                  <moneyMarket>
                    <positionName>3</positionName>
                    <notional>50000</notional>
                    <currency>EUR</currency>
                  </moneyMarket>
                </positions>"


 position1 <- read_xml(position1)
 position2 <- read_xml(position2)
 position3 <- read_xml(position3)


 i <- 3

 while(i>1) {

     mychildren <- xml_children(get(paste0("position",i)))

     for (child in mychildren) {

        xml_add_child(get(paste0("position",i-1)), child)

     }

     i <- i-1

 } 

 xmlParse(position1)

Output:

  <?xml version="1.0" encoding="UTF-8"?>
  <positions>
     <moneyMarket>
       <positionName>1</positionName>
       <notional>10000</notional>
       <currency>EUR</currency>
     </moneyMarket>
     <moneyMarket>
       <positionName>2</positionName>
       <notional>40000</notional>
       <currency>EUR</currency>
     </moneyMarket>
     <moneyMarket>
       <positionName>3</positionName>
       <notional>50000</notional>
       <currency>EUR</currency>
     </moneyMarket>
 </positions>