Missing part of xml when reading of BGG xml file by R XML and xml2 packages

41 Views Asked by At

I am reading and parsing xml files from the BoardGameGeek xml API2. For certain files, the xml files that I get back do not match the full xml file found in the url. Here is one example:

library(XML)
library(xml2)
bgg_url_api2 <- paste0('https://boardgamegeek.com//xmlapi2/thing?id=',toString(73994),
                     '&type=boardgame,boardgameexpansion,boardgameaccesory,rpgitem,rpgissue,videogame&versions=1&stats=1&videos=1&marketplace=1&pricehistory=1&comments=1')

data_api <- readLines(bgg_url_api2)

if (!'try-error' %in% class(try(xmlParse(data_api)))){xmlfile_api = xmlParse(data_api)
  saveXML(xmlfile_api, paste0('D:\\BGG\\BGG_xml_files_api2\\bgg_test.xml'))}

What happens is that a chunk of the original file is missing from the file I save, especially the "versions" section. I don't know if that's because it's corrupted or bad xml style or something else. I thought that using readLines would read the url exactly. Is there a way to fix this? Can I somehow just literally copy the xml text/code in the online file to my file? Thanks.

1

There are 1 best solutions below

0
On BEST ANSWER

You should use the httr package, which provides more control over HTTP requests and responses. Here's how you can modify your code to fetch the XML data reliably:


library(httr)
library(XML)

bgg_url_api2 <- paste0('https://boardgamegeek.com//xmlapi2/thing?id=', toString(73994),
                     '&type=boardgame,boardgameexpansion,boardgameaccesory,rpgitem,rpgissue,videogame&versions=1&stats=1&videos=1&marketplace=1&pricehistory=1&comments=1')

# Send an HTTP GET request to the URL
response <- GET(bgg_url_api2)

# Check if the request was successful
if (http_type(response) == "text/xml") {
  # Parse the content of the response as XML
  xmlfile_api <- xmlParse(content(response, "text"))
  
  # Save the XML data to a file
  saveXML(xmlfile_api, paste0('D:\\BGG\\BGG_xml_files_api2\\bgg_test.xml'))
} else {
  cat("Failed to retrieve XML data from the URL.\n")
}