I am trying to convert an xml document to a flat dataframe using xml2
.
Here's some sample code, with the schema section removed. I'm trying to extract all of the "Events" nodes:
library(xml2)
test_xml <- as_xml_document(
'<Root>
<xs:schema xmlns="address.com" xmlns:mstns="address.com" id="id">
</xs:schema>
<NewDataSet xmlns="address.com">
<Events>
<VAR1>3119496</VAR1>
<VAR2>3119496</VAR2>
<VAR3>text</VAR3>
</Events>
<Events>
<VAR1>3119496</VAR1>
<VAR2>3119496</VAR2>
<VAR3>text</VAR3>
</Events>
</NewDataSet>
</Root>'
)
And here's a picture of my RStudio when I use read_xml("file_path") %>% View()
:
Based on this I would expect something like the following to work...
xml_df <- test_xml %>%
xml_child(2) %>%
xml_find_all("//Events") %>%
map_df(~ { xml_attrs(.x) %>% as.list() } )
...but it doesn't. My guess is that the problem is with my xpath in xml_find_all
, but I'm not sure. Any help would be really appreciated!
EDIT: Given that the first answer did not work (before I added in the namespaces) I am guessing that the namespaces in the new example are causing an issue.
For anyone who comes across this question in the future, the problem above was because of the namespaces in the xml.
Combining Allan's answer above with the response here, I just needed to either use
xml_ns_strip()
or the code below to turn my xml into a dataframe.