exists and sapply: why are these functions different?

281 Views Asked by At

Why are the two functions fn and gn below different? I don't think they should be, but I must be missing something.

vars <- letters[1:10]
a <- b <- 1
fn <- function (d) {
    sapply( vars, exists )
}
gn <- function (d) {
    sapply( vars, function (x) { exists(x) } )
}
fn(d=2)
#    a     b     c     d     e     f     g     h     i     j 
# TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE 
gn(d=2)
#    a     b     c     d     e     f     g     h     i     j 
# TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE 
exists("i")
# [1] FALSE

There are two differences:

  1. gn(d=2) says that d exists, but why doesn't fn(d=2)?
  2. fn(d=2) says that i exists, when gn(d=2) does not. This is puzzling, because I haven't defined i anywhere.

Note: This is on R version 3.2.0, it seems the second behavior is new to that version (see below).

4

There are 4 best solutions below

1
On BEST ANSWER

Why i is different...

It looks like there were changes in R 3.2. An index variable i has been added to the current environment of lapply (which is what sapply actually calls). This goes along with the new behavior to force evaluation of the parameters passed along to the function you are applying over. This means that you now have access to the index of the current iteration you are on in the loop.

The reason fn and gn behave differently is that exists() looks in the environment where it is called. In the case of fn, that is the environment where this i variable has been created. In the case of gn, it's looking in the environment of your anonymous function. When R cannot find a symbol in the local environment, it searches environments based on where functions where defined, not where they are called. This means R will not find the i variable since your anonymous function is defined in a place where the i variables does not exist.

We can write a little helper function to make it easier to grab the current index.

idx <- function() get("i", parent.frame(2))
sapply(letters[1:3], function(x) paste(idx(), x))
#     a     b     c 
# "1 a" "2 b" "3 c"

As far as I can tell this is currently undocumented behavior. It may change in future versions of R.

Why d is different...

The discrepancy with the d variable is a more direct scoping issue. Again R is creating a new environment which it is using to call the function exists. The parent of this environment is the base environment. So when you call exists it looks where it was called from (which is this environment where i exists) and since it doesn't find d there, it searches the next parent which is the base environment. The current function environment is never searched. You could explicitly search the current environment with

fn <- function (d) {
    sapply( vars, exists, where=environment() )
}
fn(d=2)
#    a     b     c     d     e     f     g     h     i     j 
# TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE 

For more information on environments in R I suggest you read the Environments section of Advanced R

0
On

Regarding the i, I could not reproduce it. But I will try to explain the other differences, first the c that both functions find and the d that only gn() finds.

Both functions find c because they are finding the base function c.

Now as to the d, R is lexically scoped. You might want to look at this question Environments in R, mapply and get. So, in the first function fn(), exists doesn't look for d in the local environment of fn(), but at the global environment (and there isn't a d there).

Note, however, that in gn() you define the function that uses exists() as an anonymous function inside gn(). Look:

gn <- function (d) {
    sapply( vars, function (x) { exists(x) } ) # defined anon function
}

So the environment of gn() is the parent environment of the anonymous function function (x) { exists(x) }. That is why it searches inside gn and finds the local argument d, returning TRUE.

4
On

The important difference is the value for "d", the name of the formal variable in the call. . The default settings for the exists function are to look "above" not "in". In the first instance there is no "d"-named entity in the .GlobalEnv, which is where the functions was called from. In the second instance, there is a d (as a name) in the surrounding environment.

People who are finding "i" or "c" are find the function c or an index for a for-loop. (The index variable persists at the conclusion of a for-loop.)

It appears that the "i" value difference is only in the latest version which I was not seeing in this machine since it is maxed out at 3.1.3 with my current OS version. It could have also happen to people who had used "i" as an index variable in a for-loop in a current session (or were working with an earlier saved environment.)

0
On

It is odd behavior. I'm sure there's a reasonable explanation for it. Here's a brand new R session. I didn't even use RStudio this time, which produces the same output.

R version 3.2.0 (2015-04-16) -- "Full of Ingredients"
Copyright (C) 2015 The R Foundation for Statistical Computing

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> exists('i')
[1] FALSE
> sapply('i', exists)
   i 
TRUE 

More odd behavior:

> lapply('i', exists)
[[1]]
[1] TRUE

> vapply('i', exists, logical(1))
   i 
TRUE 
> tapply('i', 'i', exists)
   i 
TRUE 
> mapply(exists, list('i'))
[1] FALSE
> rapply(list('i'), exists)
[1] FALSE
> exists('i')
[1] FALSE
> ls()
character(0)

Both mapply and rapply return FALSE. They both also require lists as arguments. That might have something to do with it. Perhaps the other apply functions are picking something up in some environment within R that the bare function and some apply functions aren't.