R the number of significant digits leads to unexpected results of inequality using eval and parse text

97 Views Asked by At

I am working on boolean rules related to terminal node assignment for CART-like trees related to my work (http://web.ccs.miami.edu/~hishwaran/ishwaran.html)

I have noticed problematic behavior in evaluating inequalities of character strings using eval and parse of text. The issue has to do with how R evaluates the internal representation of a number.

Here's an example involving the number pi. I want to check if a vector (which I call x) is less than or equal to pi.

> pi
> [1] 3.141593
> rule = paste0("x <= ", pi)
> rule
> [1] "x <= 3.14159265358979"

This rule checks whether the object x is less than pi where pi is represented to 14 digits. Now I will assign x to the values 1,2,3 and pi

> x = c(1,2,3,pi)

Here's what x is up to 15 digits

> print(x, digits=15)
> [1] 1.00000000000000 2.00000000000000 3.00000000000000 3.14159265358979

Now let's evaluate this

> eval(parse(text = rule))
> [1] TRUE TRUE TRUE FALSE

Whooaaaaa, it looks like pi is not less than or equal to pi. Right?

But now if I hard-code x to pi to 14 digits, it works:

> x = c(1,2,3,3.14159265358979)
> eval(parse(text = rule)) [1] TRUE TRUE TRUE TRUE

Obviously in the first case, the internal representation for pi has many digits and so when R evaluates the expression, it is greater than the float representation and it returns FALSE. In the second case it compares two floats, so the result is true.

However, how to avoid this happening? I really need the first evaluation to come back true because I am automating this process for rule based inference and I cannot hard code a value (here this being pi) each time.

One solution I use is to add a small tolerance value.

> tol = sqrt(.Machine$double.eps)
> rule = paste0("x <= ", pi + tol)
> x = c(1,2,3,pi)
> eval(parse(text = rule))
> [1] TRUE TRUE TRUE TRUE

However, this seems like an ugly solution.

Any comments and suggestions are greatly appreciated!

1

There are 1 best solutions below

3
On

You could just go via the pi name or via a function instead, to prevent pi from getting stringified (which is your first problem here)


rule  <-  "x <= pi"
x  <-  c(1,2,3,pi)

eval(parse(text = rule)) ## All TRUE

## another way might be to throw stuff you need uneval'ed into a function or a block:

my_pi <- function() {
    pi
}

rule  <-  "x <= my_pi()"
eval(parse(text = rule)) ## All TRUE


You still will suffer from the usual floating point issues, but imprecise stringification won't be your problem anymore.

Here's why your approach didn't work:


> print( pi, digits=20 )
[1] 3.141592653589793116
> print( eval(parse(text=pi)), digits=20 )
[1] 3.1415926535897900074

The stringified pi is less than R's pi by a good margin.

The paste manual says it uses as.character to convert numbers to strings. Which in turn says it's using 15 significant digits which is what you are observing.