This is my first ever stack question, so if I did something wrong please tell me.
I am trying to parse data with the xml2 package and possibly the pandas package. Beneath you can find a anonymized snapshot of the data.
<?xml version="1.0" encoding="utf-8"?>
<a xmlns:xsd="http://www.y.org/y1/y2" xmlns:xsi="http://www.y.org/y1/y3" xmlns="http://x.nl/">
<b1>1</b1>
<b2>2019-07-01T10:01:35.312+02:00</b2>
<b3>xxx</b3>
<b4>xxx</b4>
<b5>
<c>
<d1>
</d1>
<d2>xxxx</d2>
<d3>
<e1>
</e1>
<e2>
<ID>1</ID>
<f2>XXXXXXXXXXX</f2>
<event>
<eventType>start</eventType>
<eventValue>true</eventValue>
<timestamp>2019-10-07T13:45:00.00+02.00</timestamp>
</event>
<event>
<eventType>next</eventType>
<eventValue>itm1</eventValue>
<timestamp>2019-10-07T13:46:00.00+02.00</timestamp>
</event>
<event>
<eventType>next</eventType>
<eventValue>itm2</eventValue>
<timestamp>2019-10-07T13:47:00.00+02.00</timestamp>
</event>
<event>
<eventType>next</eventType>
<eventValue>itm3</eventValue>
<timestamp>2019-10-07T13:48:00.00+02.00</timestamp>
</event>
I want to create something like the table below.
+-----------+------------+------------------------------+
| EventType | EventValue | timestamp |
+-----------+------------+------------------------------+
| start | true | 2019-10-07T13:45:00.00+02.00 |
| next | itm1 | 2019-10-07T13:46:00.00+02.00 |
| next | itm2 | 2019-10-07T13:47:00.00+02.00 |
| next | itm3 | 2019-10-07T13:48:00.00+02.00 |
+-----------+------------+------------------------------+
I tried xml_find_all function to find all events, but I always get {xml_nodeset (0))}.
x <- xml_find_all(data, "//event", xml_ns(data))
Could someone send me in the right direction and possibly give me a hint to create a dataframe like above as well? Would be amazing
This XML file contains some namespaces:
To read nodes from it, there are 2 ways. The easy way is to remove all the namespaces:
Or you can add the prefix to your XPath to get the nodes:
Output: