Consider this
do.call(rbind, list(data.table(x=1, b='x'),data.table(x=1, b=NA)))
returns
x b
1: 1 x
2: 1 NA
but
do.call(rbind, list(data.table(x=1, b=NA),data.table(x=1, b='x')))
returns
x b
1: 1 NA
2: 1 NA
How can i force the first behavior, without reordering the contents of the list?
Data table is really really faster in mapreduce jobs (calling data.table ~10*3MM times across 55 nodes, the data table is many many times faster than data frame, so i want this to work ...) Regards saptarshi
As noted by Frank, the problem is that there are (somewhat invisibly) several different types of
NA
. The one produced when you typeNA
at the command line is of class"logical"
, but there are alsoNA_integer_
,NA_real_
,NA_character_
, andNA_complex_
.In your first example, the initial
data.table
sets the class of columnb
to "character", and theNA
in the seconddata.table
is then coerced to anNA_character_
. In the second example, though, theNA
in the firstdata.table
sets columnb
's class to "logical", and, when the same column in the second data.table is coerced to "logical", it's converted to a logical NA. (Tryas.logical("x")
to see why.)That's all fairly complicated (to articulate, at least), but there is a reasonably simple solution. Just create a 1-row template
data.table
, and prepend it to each list ofdata.table
's you want torbind()
. It will establish the class of each column to be what you want, regardless of whatdata.table
's follow it in the list passed torbind()
, and can be trimmed off once everything else is bound together.