I'm new to R and the FF package, and am trying to better understand how FF allows users to work with large datasets (>4Gb). I have spent a considerable amount of time trawling the web for tutorials, but the ones I could find generally go over my head.
I learn best by doing, so as an exercise, I would like to know how to create a long-format time-series dataset, similar to R's in-built "Indometh" dataset, using arbitrary values. Then I would like to reshape it into wide format. Then I would like to save the output as a csv file.
With small datasets this is simple, and can be achieved using the following script:
##########################################
#Generate the data frame
DF<-data.frame()
for(Subject in 1:6){
for(time in 1:11){
DF<-rbind(DF,c(Subject,time,runif(1)))
}
}
names(DF)<-c("Subject","time","conc")
##########################################
#Reshape to wide format
DF<-reshape(DF, v.names = "conc", idvar = "Subject", timevar = "time", direction = "wide")
##########################################
#Save csv file
write.csv(DF,file="DF.csv")
But I would like to learn to do this for file sizes of approximately 10 Gb. How would I do this using the FF package? Thanks in advance.
The function
reshape
does not explicitly exists for ffdf objects. But it is quite straightforward to execute with functionality from packageffbase
. Just use ffdfdply from packageffbase
, split by Subject and applyreshape
inside the function.An example on the Indometh dataset with 1000000 subjects.