Summary statistics on out-of-memory file

54 Views Asked by At

I have a csv file that's 120GB in size which is a set of numerical values grouped by categorical variables.

eg.

df<-as.data.frame(x=rbing(rep("BLO",100),rep("LR",100)), y=runif(200))

I would like to calculate some summary statistics using group_by(x) but my file doesn't fit into memory. What are my options? I've looked at tidyfst and {disk.frame} but I'm not sure. Any help would be much appreciated.

Thank you.

0

There are 0 best solutions below