I have an object which is available through a URL in JSON format, and part of it I want to put into a dataframe format to analyse the data in R. I currently do this as follows:
# Read data in using fromJSON function
data <- RJSONIO::fromJSON('https://api-prod.footballindex.co.uk/football.allTradable24hrchanges?page=1&per_page=5000&sort=asc')
# The above is an example dataset to use for this question
# In my actual dataset there are fields which only exist in some elements in list so adding this onto example
# I want to handle these by returning in the end dataframe with NA if it doesn't exist
data[['items']][[1]]$newField <- 1
# It is only the data in the items field I am interested in
# Unlist each element to get all nested emelents within the lists in flat format
dataList <- lapply(data[['items']], unlist)
# Combine all elemnts of list together
dataDF <- dplyr::bind_rows(dataList)
# Convert into data.frame
dataDF <- data.frame(dataDF)
This works, however the bind_rows
part takes a long time
> system.time(dataDF <- dplyr::bind_rows(dataList))
user system elapsed
42.195 0.000 42.216
It feels like there must be a quicker way of doing this.
I am told that data.table::rbindlist
is a faster alternative but using this gives me error message
> dataDF <- data.table::rbindlist(dataList)
Error in data.table::rbindlist(dataList) :
Item 1 of input is not a data.frame, data.table or list
Had a suggestion of using do.call(rbind...
in an answer which runs and is fast, but when there are fields which are only in some elements it does not handle this properly. For example
dataDF2 <- data.frame(do.call(rbind, dataList))
> head(dataDF$country)
[1] "Côte d'Ivoire" "Italy" "England" "Scotland" "Germany" "France"
> head(dataDF2$country)
[1] "Côte d'Ivoire" "1.65" "1.62" "FALSE" "2.59" "France"
Thanks in advance for any help