I've created a script that reads data from two GitHub repositories, reformats the datasets, binds them together by rows and then writes everything in a new .csv file. Then, I scheduled the run of this script every hour through the functionalities of the cronR package.
Here's my code:
devtools::install_github("tidyverse/googlesheets4")
library(dplyr)
library(googlesheets4)
library(RCurl)
setwd(dir = "YOUR_WORKING_DIRECTORY")
###############################################################################
#================== TIME SERIES DATA FOR CASES AND DEATHS ====================#
###############################################################################
# 1. #####==== DATASETS =====#####
# 1.1 ###= Cases #####
# These files are updated on GitHub every day.
cases <- read.csv(text = getURL(url = "https://raw.githubusercontent.com/openZH/covid_19/master/COVID19_Cases_Cantons_CH_total.csv"),
header = TRUE,
stringsAsFactors = FALSE,
na.strings = c("", "NA"),
encoding = "UTF-8")
# Removed data for whole Switzerland and Leichtenstein
cases <- subset(x = cases,
!is.element(el = canton,
set = c("CH", "FL")),
select = c("date",
"canton",
"tested_pos"))
names(cases)[1] <- "Date"
# Dataset restructured according to the cases dataset format
cases <- reshape(data = cases,
idvar = "Date",
timevar = "canton",
v.names = "tested_pos",
direction = "wide",
)
names(cases) <- gsub(pattern = "tested_pos.",
replacement = "",
x = names(cases))
cases[is.na(cases)] <- 0
cases <- cases[order(cases$Date,
decreasing = FALSE), ]
# More updated dataset
cases2 <- read.csv(text = getURL(url = "https://raw.githubusercontent.com/daenuprobst/covid19-cases-switzerland/master/covid19_cases_switzerland.csv"),
header = TRUE,
stringsAsFactors = FALSE,
na.strings = c("", "NA"),
encoding = "UTF-8")
# Remove total daily cases for Switzerland
cases2 <- subset(x = cases2,
select = -c(CH))
# rbind between two cases datasets
cases_tot <- bind_rows(cases[1:7, ],
cases2)
rownames(cases_tot) <- seq(from = 1,
to = nrow(cases_tot),
by = 1)
write.csv(x = cases_tot,
file = paste0(getwd(),
"/cases_tot.csv"),
row.names = FALSE,
quote = FALSE)
When I manually run my script everything is ok and the .csv produced is fine, but if you try to schedule the run of this script through the cronR package (from RStudio IDE click on Addins -> Schedule R scripts on Linux/Unix) the .csv saved is different just for the column "Date". In fact, the dates of the first dataset are on the first column, but the dates of the second dataset (to bind to the first through bind_rows()
) are at the end of the dataset, and the header has a new strange name (as you can see from this image).
Do you have any idea of what could be the problem? Thanks a lot!
P.S.: I work on a MacBook Pro late 2016, 8 Gb of RAM, with macOS Catalina installed.