R data.table dcast() function adds garbage after decimal point?

181 Views Asked by At

After a long struggle with my code I think I found a strange behavior of dcast() function in data.table library. Can anyone confirm it, or am I doing something wrong?

For the sake of example:

tt <- data.table(a=runif(n=300,min=0,max=1000000),
                 b=rep(paste("d",1:3,sep="",collapse=NULL),each=100),
                 c=rep(LETTERS[1:3],each=100))
t2 <- dcast(tt, c~b, fun.aggregate=sum, value.var = "a")
t2
# c         d1         d2         d3
# 1: A 2531364379          0          0
# 2: B          0 2527589493          0
# 3: C          0          0 2532147262

Now, I would assume that numbers in t2 are exactly the same as in tt. But they are not, since some garbage appears after decimal point. For example, in the third column:

t2$d3[3]-round(t2$d3[3],0)
# [1] 0.3269196
1

There are 1 best solutions below

1
On

Use options(digits=22) (or some somewhat high number). This has nothing to do with how the number is stored, just how it is represented on the console.

A reproducible example:

set.seed(42)
tt <- data.table(a=runif(n=300,min=0,max=1000000),
                 b=rep(paste("d",1:3,sep="",collapse=NULL),each=100),
                 c=rep(LETTERS[1:3],each=100))
t2 <- dcast(tt, c~b, fun.aggregate=sum, value.var = "a")
t2
#         c       d1       d2       d3
#    <char>    <num>    <num>    <num>
# 1:      A 52447875        0        0
# 2:      B        0 51995321        0
# 3:      C        0        0 44077214
t2$d3[3]-round(t2$d3[3],0)
# [1] 0.4191433

The better see the digits:

options(digits=22)
t2
#         c                 d1                 d2                 d3
#    <char>              <num>              <num>              <num>
# 1:      A 52447874.720674008        0.000000000        0.000000000
# 2:      B        0.000000000 51995320.511283353        0.000000000
# 3:      C        0.000000000        0.000000000 44077214.419143274

However, there is no problem with the underlying numbers. Regardless of the value of digits, it is still there.

The difference between what a number is versus how it is printed can be demonstrated thusly:

options(digits=1)
pi
# [1] 3
options(digits=22)
pi
# [1] 3.1415926535897931

At no point did the real value of pi change, just how it is shown on the console.