Consider this
do.call(rbind, list(data.table(x=1, b='x'),data.table(x=1, b=NA)))
returns
x b
1: 1 x
2: 1 NA
but
do.call(rbind, list(data.table(x=1, b=NA),data.table(x=1, b='x')))
returns
x b
1: 1 NA
2: 1 NA
How can i force the first behavior, without reordering the contents of the list?
Data table is really really faster in mapreduce jobs (calling data.table ~10*3MM times across 55 nodes, the data table is many many times faster than data frame, so i want this to work ...) Regards saptarshi
As noted by Frank, the problem is that there are (somewhat invisibly) several different types of
NA. The one produced when you typeNAat the command line is of class"logical", but there are alsoNA_integer_,NA_real_,NA_character_, andNA_complex_.In your first example, the initial
data.tablesets the class of columnbto "character", and theNAin the seconddata.tableis then coerced to anNA_character_. In the second example, though, theNAin the firstdata.tablesets columnb's class to "logical", and, when the same column in the second data.table is coerced to "logical", it's converted to a logical NA. (Tryas.logical("x")to see why.)That's all fairly complicated (to articulate, at least), but there is a reasonably simple solution. Just create a 1-row template
data.table, and prepend it to each list ofdata.table's you want torbind(). It will establish the class of each column to be what you want, regardless of whatdata.table's follow it in the list passed torbind(), and can be trimmed off once everything else is bound together.