With xml2 I have written a code which transforms an xml file I am using into a desired data frame. I now need to repeat this for the other 1218 xml files in my folder. However, I am struggling to work out where to start. I know I need to list the files:
files <- list.files(pattern = ".xml$")
And then a loop or Sapply will be needed but I'm not sure how. Any advice would be much a appreciated.
Code so far:
xmlimport <- read_xml("16770601.xml")
class(xmlimport)
trialaccounts <- xmlimport %>% xml_find_all('//div1[@type="trialAccount"]')
defendants=NULL
for(i in 1:length(trialaccounts)) {
trialid <- trialaccounts[[i]] %>% xml_attr("id")
year <- trialaccounts[[i]] %>% xml_find_first('.//interp[@type="year"]') %>% xml_attr("value")
genderdefendants <- trialaccounts[[i]] %>%
xml_find_all('.//persName[@type="defendantName"]/interp[@type="gender"]') %>%
xml_attr("value")
descrip <- trialaccounts[[i]] %>%
xml_find_all('.//persName[@type="defendantName"]') %>%
xml_text(trim=TRUE)
verdict <- trialaccounts[[i]] %>%
xml_find_all('.//interp[@type="verdictCategory"]')%>% xml_attr("value")
context <- xml_text(trialaccounts[[i]])
for(j in 1:length(genderdefendants)) {
defendants <- defendants %>%
bind_rows(tibble(defendantid=i,trial_id=trialid,year_tried=year,description=descrip,verdict_result=verdict,info=context,gender=genderdefendants[j]))
}
}
I would recommend writing a function to parse one xml and using package
purrr
to map it to your file list:This will give you a list of length 1218. You can access the first result with
result[[1]]
. Or if you want to combine all results in one table use: