I've been trying for quite some time to get my test data to split.
> FDF <- read.csv.ffdf(file='C:\\Users\\William\\Desktop\\R Data\\TestData0812.txt', header = FALSE, colClasses=c('factor','factor','numeric','numeric','numeric','numeric'), sep=',')
> names(FDF)<- c('Date','Time','Open','High','Low','Close')
>
> # ID
> FDF2 <-FDF[1:100,]
> FDF2 <- as.ffdf(FDF2)
> a <- nrow(FDF2)
> # Take section of import for testing
> FDF2[1:3,]
Date Time Open High Low Close
1 1987.08.28 12:00 1.6238 1.6240 1.6237 1.6239
2 1987.08.28 12:01 1.6239 1.6240 1.6235 1.6236
3 1987.08.28 12:02 1.6236 1.6239 1.6235 1.6238
>
> ID <- data.frame(matrix(1:a, nrow = a, ncol=1 ))
> ID <- as.ffdf(ID)
> names(ID) <- c('ID')
> FDF3 <- cbind.ffdf2(ID, FDF2)
> # Create ID column and binds together
> FDF3[1:3,]
ID Date Time Open High Low Close
1 1 1987.08.28 12:00 1.6238 1.6240 1.6237 1.6239
2 2 1987.08.28 12:01 1.6239 1.6240 1.6235 1.6236
3 3 1987.08.28 12:02 1.6236 1.6239 1.6235 1.6238
The file I will be using this on is an ffdf object, as it is 700mb. I would like to know how I could split the dataset?
My current code is;
T = ffdfdply(FDF3, split(FDF3$ID, rep(1:10,each=10)))
I have done quite a few variation of this and research across the forum and other. However, for simplicity I've just included the above example.
Upon operation the code above gives me the following error;
Error in ffdfdply(FDF3, split(FDF3$ID, rep(1:10, each = 10))) :
split needs to be the same length as the number of rows in x
I can't seem to understand why a split of rep(1:10, each = 10)
is not working in a data set that is > dim(FDF3)
[1] 100 7
I would like the split to perform even if there are not a full amount of rows for each split also, lets say: T = ffdfdply(FDF3, split(FDF3$ID, rep(1:10,each=3)))
I've been on this for at least 20 hours.
I couldn't figure out the correct usage of the ffdfdplyr package, and I am still unaware of whether it would have been a correct usage or not. However, I have constructed a work around and hope someone finds it useful. I would add, it is indeed ugly, therefore I'm open to suggestion on how to simply this and would appreciate your comments.