How to extract XML attributes and process these into a dataframe?

127 Views Asked by At

I am a beginner in R Programming.

I would like to scrape football data from Squawka and place these in a dataframe in order to conduct analyses (newborn hobby of Football Analytics), more precisely from these kind of pages: http://eredivisie.squawka.com/willem-ii-vs-psv/10-08-2014/dutch-eredivisie/matches.

On Stack Overflow I found a thread about how to conduct this: how to scrape this squawka page?.

Unfortunately, when I implement the code (see below) that is given in the above-mentioned thread for processing XML attributes/data into a data frame, I receive the following error message:

"Error in (function (..., deparse.level = 1, make.row.names = TRUE, stringsAsFactors = default.stringsAsFactors()) : numbers of columns of arguments do not match”

data <- lapply(example, function(x){ 
  if(length(x['event']) > 0){
    res <- lapply(x['event'], function(y){
    matchAttrs <- as.list(xmlAttrs(y))
    matchAttrs$start <- xmlValue(y['start']$start)
    matchAttrs$end <- xmlValue(y['end']$end)
    matchAttrs
  })
  return(do.call(rbind.data.frame, res))
}
}
)

The outcome should be something similar like this:

player_id           mins secs minsec team type  start       end
event         531    4   39    279   44 Failed 73.1,87.1 97.9,49.1
event5        311    6   33    393   31 Failed 92.3,13.1 93.0,31.0
event1        376    8   57    537   31 Failed  97.7,6.1 96.7,16.4
event6        311   13   50    830   31 Failed  99.5,0.5 94.9,42.6
event11       311   14   11    851   31 Failed  99.5,0.5 93.1,51.0
event7        311   17   41   1061   31 Failed 99.5,99.5 92.6,50.1

I have tried several other solutions that I found on Stack Overflow that have dealt with similar situations, but till now I did not manage to come up with a proper solution.

0

There are 0 best solutions below