xml_nodeset to tibble, one row per xml_nodeset (item)

128 Views Asked by At

I have a complicated xml file with items as 1st child nodes. The items can have different structure and some of the attributes are missing in some of them. I need to store one item (nodeset) in tibble row, so that I keep track on missing attributes and write a function handling all variants.

I found a solution of the first step by Felix Ebert: https://stackoverflow.com/questions/49253021/how-to-extract-xml-attr-and-xml-text-on-different-levels-with-xml2-and-purrr

I copy part of the code here:

xml <- xml2::read_xml("input/example.xml")
rows <- xml %>% xml_find_all("//xmlsubsubnode")
rows_df <- data_frame(node = rows)

Function data_frame was depreciated and I got error messages if I replace it with

tibble()
as_tibble()
data.frame()

With "tibble" I get following ERROR:

df_articles <- tibble(item = xml_articles)
Error:
! All columns in a tibble must be vectors.
✖ Column `item` is a `xml_nodeset` object.
Backtrace:
1. tibble::tibble(item = xml_articles)
2. tibble:::tibble_quos(xs, .rows, .name_repair)
3. tibble:::check_valid_col(res, col_names[[j]], j)
4. tibble:::check_valid_cols(set_names(list(x), name))

I would be grateful if anybody can update the original post.

0

There are 0 best solutions below