I am reading and parsing xml files from the BoardGameGeek xml API2. For certain files, the xml files that I get back do not match the full xml file found in the url. Here is one example:
library(XML)
library(xml2)
bgg_url_api2 <- paste0('https://boardgamegeek.com//xmlapi2/thing?id=',toString(73994),
'&type=boardgame,boardgameexpansion,boardgameaccesory,rpgitem,rpgissue,videogame&versions=1&stats=1&videos=1&marketplace=1&pricehistory=1&comments=1')
data_api <- readLines(bgg_url_api2)
if (!'try-error' %in% class(try(xmlParse(data_api)))){xmlfile_api = xmlParse(data_api)
saveXML(xmlfile_api, paste0('D:\\BGG\\BGG_xml_files_api2\\bgg_test.xml'))}
What happens is that a chunk of the original file is missing from the file I save, especially the "versions" section. I don't know if that's because it's corrupted or bad xml style or something else. I thought that using readLines would read the url exactly. Is there a way to fix this? Can I somehow just literally copy the xml text/code in the online file to my file? Thanks.
You should use the httr package, which provides more control over HTTP requests and responses. Here's how you can modify your code to fetch the XML data reliably: