Functions for creating and reshaping big data in R using the FF package

Question

Functions for creating and reshaping big data in R using the FF package

1.1k Views Asked by Luke23 At 18 October 2025 at 09:04

I'm new to R and the FF package, and am trying to better understand how FF allows users to work with large datasets (>4Gb). I have spent a considerable amount of time trawling the web for tutorials, but the ones I could find generally go over my head.

I learn best by doing, so as an exercise, I would like to know how to create a long-format time-series dataset, similar to R's in-built "Indometh" dataset, using arbitrary values. Then I would like to reshape it into wide format. Then I would like to save the output as a csv file.

With small datasets this is simple, and can be achieved using the following script:

##########################################
#Generate the data frame

DF<-data.frame()
for(Subject in 1:6){
  for(time in 1:11){
    DF<-rbind(DF,c(Subject,time,runif(1)))
  }
}
names(DF)<-c("Subject","time","conc")

##########################################
#Reshape to wide format

DF<-reshape(DF, v.names = "conc", idvar = "Subject", timevar = "time", direction = "wide")

##########################################
#Save csv file

write.csv(DF,file="DF.csv")

But I would like to learn to do this for file sizes of approximately 10 Gb. How would I do this using the FF package? Thanks in advance.

Original Q&A

There are 2 best solutions below

IRTFM On 31 January 2014 at 07:36

You would be hard put to construct a less efficient method than what you offer. Using rbind.data.frame is incredibly inefficient. Try this instead to create a six thousand line dataset for 6 subjects:

DF <- data.frame( Subj = rep( 1:6, each=1000), matrix(runif(6000*11), nrow=6000) )

Scaling it up to have a billion items (US billion, not UK billion) should give you about an 10GB object, so maybe trying 80 million lines or so?

I think asking for a tutorial in the ff-package is out-of-scope for SO. Please read the FAQ. Such questions are generally closed because the questioner demonstrates that they don't really know what they are talking about.

**AudioBubble** · Accepted Answer

The function reshape does not explicitly exists for ffdf objects. But it is quite straightforward to execute with functionality from package ffbase. Just use ffdfdply from package ffbase, split by Subject and apply reshape inside the function.

An example on the Indometh dataset with 1000000 subjects.

require(ffbase)
require(datasets)
data(Indometh)

## Generate some random data
x <- expand.ffgrid(Subject = ff(factor(1:1000000)), time = ff(unique(Indometh$time)))
x$conc <- ffrandom(n=nrow(x), rfun = rnorm)
dim(x)
[1] 11000000        3

## and reshape to wide format
result <- ffdfdply(x=x, split=x$Subject, FUN=function(datawithseveralsplitelements){
  df <- reshape(datawithseveralsplitelements, 
              v.names = "conc", idvar = "Subject", timevar = "time", direction = "wide")
  as.data.frame(df)
})
class(result)
[1] "ffdf"
colnames(result)
[1] "Subject"   "conc.0.25" "conc.0.5"  "conc.0.75" "conc.1"    "conc.1.25" "conc.2"    "conc.3"    "conc.4"    "conc.5"    "conc.6"    "conc.8"   
dim(result)
[1] 1000000      12

Functions for creating and reshaping big data in R using the FF package

There are 2 best solutions below

Related Questions in R

Related Questions in BIGDATA

Related Questions in RESHAPE

Related Questions in FF

Related Questions in FFBASE

Trending Questions

Popular # Hahtags

Popular Questions