I am using R with Hadoop streaming where at the reducer, the value is a character array where each element is a string contains a few columns terminated by certain character, char(2) 002
in this case.
Is there an easy way to split the string into three fields and build a data frame from it?
Here is what I have done but I just have the feeling that I over engineered it again.
inputarray <- c("20130806\00211\00291.55", "20130807\00211\00291.55", "20130808\00211\00291.55",
"20130809\00211\00291.55", "201308010\00211\00291.55", "201308011\00211\00291.55",
"201308012\00211\00291.55", "201308013\00211\00291.55", "201308014\00211\00291.55"
)
tmp <- lapply(inputarray, FUN=function(x) strsplit(x, rawToChar(as.raw(2))) )
tmp <- data.frame(matrix(unlist(tmp), ncol=3, byrow=TRUE))
names(tmp) <- c("date", "qtyavail", "price")
tmp
Thanks!
You could use
read.table
. First I add an element for the names at the beginning ofinputarray
Alternatively, you could also use
cSplit
from thesplitstackshape
package