I am fairly new to R and need some help to (extract and) combine file names and properties with data extracted from multiple xml files (about 200) which will should then be converted into a dataframe.
I am using the following script to select the xml files, extract the data and convert it into a dataframe (and is working without errors):
library(XML)
library(plyr)
# Select multiple xml files within directory
FileName <- list.files(pattern = "xml$",
ignore.case=TRUE,
full.names = FALSE)
# Create function to extract data
RI_ID <-function(FileName) {
doc1 <- xmlParse(FileName)
doc <- xmlToDataFrame(doc1["//ObjectList[@ObjectType='pkg']/o"], )
}
# Convert to dataframe
T1 <- ldply(FileName,RI_ID)
# Rename columns
names(T1)[names(T1) == "a"] <- "UniqueInstallationPackageID"
names(T1)[names(T1) == "b"] <- "PackageVersion_Latest"
# Convert to numeric
FieldToNumeric <- c("UniqueInstallationPackageID", "PackageVersion_Latest")
T1[,FieldToNumeric] <- lapply(T1[,FieldToNumeric], as.numeric)
I would like to (and need some help) to:
- extract the modified date of the xml file as it appear in windows explorer;
- include the file name as well as the modified date as part of the final dataframe.
I have reviewed the following two sources, but did not have any success in implementig them:
- http://datacornering.com/how-to-combine-files-with-r-and-add-filename-column/
- Read multiple xml files in R and combine the data
Due to a confidentiality agreement, I could not share an example of the xml file, but, if need be, can rename the nodes etc. and submit it. Thank you for your help.
Simply adjust
RI_ID
method to retrieve those two pieces of information (modified date/time withfile.info
andFileName
variable) and bind those values into new columns of xml data frame. Belowtransform()
allows adding columns to a data frame with comma separated assignments: