I'm new to R and the FF package, and am trying to better understand how FF allows users to work with large datasets (>4Gb). I have spent a considerable amount of time trawling the web for tutorials, but the ones I could find generally go over my head.
I learn best by doing, so as an exercise, I would like to know how to create a long-format time-series dataset, similar to R's in-built "Indometh" dataset, using arbitrary values. Then I would like to reshape it into wide format. Then I would like to save the output as a csv file.
With small datasets this is simple, and can be achieved using the following script:
##########################################
#Generate the data frame
DF<-data.frame()
for(Subject in 1:6){
for(time in 1:11){
DF<-rbind(DF,c(Subject,time,runif(1)))
}
}
names(DF)<-c("Subject","time","conc")
##########################################
#Reshape to wide format
DF<-reshape(DF, v.names = "conc", idvar = "Subject", timevar = "time", direction = "wide")
##########################################
#Save csv file
write.csv(DF,file="DF.csv")
But I would like to learn to do this for file sizes of approximately 10 Gb. How would I do this using the FF package? Thanks in advance.
The function
reshapedoes not explicitly exists for ffdf objects. But it is quite straightforward to execute with functionality from packageffbase. Just use ffdfdply from packageffbase, split by Subject and applyreshapeinside the function.An example on the Indometh dataset with 1000000 subjects.