How can I parse a XML file in R, which has been generated probably using SRSS?

51 Views Asked by At

In my job I have to perform some analytics on data shared by external organisation through user access granted on web portal. Various reports are available there, which I can view and download in many formats. Two of these formats are very useful namely MS Excel and 'XML file with report data'. Excel file is normally heavily formatted (with sub-totals, merged cells, etc.) to suit the purpose of Excel users. Converting these Excel files to data frame/table is normally a big hassle. I therefore prefer to download 'xml' file and then parse it through -> save it in csv and then carry out my analysis in R.

However, whenever I try to parse xml file directly into R (to avoid intervening convert to csv step) I never succeed. So far I have tried XML xml2 libraries in R but to no avail.

Recently I tried this code.

library("XML")
library("methods")
setwd("C:\\Users\\Administrator\\Desktop\\")
res <- xmlParse("Skil.xml")

> res <- xmlParse("Skil.xml")
xmlns: URI RptSancDig_VoucherCompilationSheet is not absolute

rootnode <- xmlRoot(res)
rootsize <- xmlSize(rootnode)

> rootsize
[1] 2

xmldataframe <- xmlToDataFrame("Skil.xml")

> xmldataframe <- xmlToDataFrame("Skil.xml")
xmlns: URI RptSancDig_VoucherCompilationSheet is not absolute

> xmldataframe 
  Textbox24 Textbox63 DDOName_Collection
1      <NA>      <NA>               <NA>
2                                       

Just to mention the file size of Skil.xml is about 12.1 Mb, and is successfully parsed in Excel.

I have also tried read_xml() function of xml2 but to no avail.

I would have happily shared a sample file to try, but I am unable to do so. Moreover, I am also unable to generate a sample file in that kind of xml format.

Can someone help?

0

There are 0 best solutions below