Missing part of xml when reading of BGG xml file by R XML and xml2 packages

74 Views Asked by At

I am reading and parsing xml files from the BoardGameGeek xml API2. For certain files, the xml files that I get back do not match the full xml file found in the url. Here is one example:

library(XML)
library(xml2)
bgg_url_api2 <- paste0('https://boardgamegeek.com//xmlapi2/thing?id=',toString(73994),
                     '&type=boardgame,boardgameexpansion,boardgameaccesory,rpgitem,rpgissue,videogame&versions=1&stats=1&videos=1&marketplace=1&pricehistory=1&comments=1')

data_api <- readLines(bgg_url_api2)

if (!'try-error' %in% class(try(xmlParse(data_api)))){xmlfile_api = xmlParse(data_api)
  saveXML(xmlfile_api, paste0('D:\\BGG\\BGG_xml_files_api2\\bgg_test.xml'))}

What happens is that a chunk of the original file is missing from the file I save, especially the "versions" section. I don't know if that's because it's corrupted or bad xml style or something else. I thought that using readLines would read the url exactly. Is there a way to fix this? Can I somehow just literally copy the xml text/code in the online file to my file? Thanks.

1

There are 1 best solutions below

0
Marwi On BEST ANSWER

You should use the httr package, which provides more control over HTTP requests and responses. Here's how you can modify your code to fetch the XML data reliably:


library(httr)
library(XML)

bgg_url_api2 <- paste0('https://boardgamegeek.com//xmlapi2/thing?id=', toString(73994),
                     '&type=boardgame,boardgameexpansion,boardgameaccesory,rpgitem,rpgissue,videogame&versions=1&stats=1&videos=1&marketplace=1&pricehistory=1&comments=1')

# Send an HTTP GET request to the URL
response <- GET(bgg_url_api2)

# Check if the request was successful
if (http_type(response) == "text/xml") {
  # Parse the content of the response as XML
  xmlfile_api <- xmlParse(content(response, "text"))
  
  # Save the XML data to a file
  saveXML(xmlfile_api, paste0('D:\\BGG\\BGG_xml_files_api2\\bgg_test.xml'))
} else {
  cat("Failed to retrieve XML data from the URL.\n")
}