Why does Google's R style guide recommend <- for assignment, rather than =?

436 Views Asked by At

I read the Google style guide for R. For "Assignment", they say:

Use <-, not =, for assignment.

GOOD:
x <- 5

BAD:
x = 5

Can you tell me what the difference is between these two methods of assignment, and why one is to be preferred over the other?

2

There are 2 best solutions below

0
On BEST ANSWER

I believe there are two reasons. One is that <- and = have slightly different meanings depending on context. For example, compare the behavior of the statements:

printx <- function(x) print(x)
printx(x="hello")
printx(x<-"hello")

In the second case, printx(x<-"hello") will also assign to the parent scope, whereas printx(x="hello") will only set the parameter.

The other reason is for historical purposes. Both R, S and the "APL" languages they were based on only allowed arrow key for assignment (which historically was only one character). Ref: http://blog.revolutionanalytics.com/2008/12/use-equals-or-arrow-for-assignment.html

1
On

Both are used, just in different contexts. If we don't use them in the right contexts, we'll see errors. See here:

Using <- is for defining a local variable.

#Example: creating a vector
x <- c(1,2,3)

#Here we could use = and that would happen to work in this case.

Using <<- , as Joshua Ulrich says, searches the parent environments "for an existing definition of the variable being assigned." It assigns to the global environment if no parent environments contain the variable.

#Example: saving information calculated in a function
x <- list()
this.function <– function(data){
  ...misc calculations...
  x[[1]] <<- result
}

#Here we can't use ==; that would not work.

Using = is to state how we are using something in an argument/function.

#Example: plotting an existing vector (defining it first)
first_col <- c(1,2,3)
second_col <- c(1,2,3)
plot(x=first_col, y=second_col)

#Example: plotting a vector within the scope of the function 'plot'
plot(x<-c(1,2,3), y<-c(1,2,3))

#The first case is preferable and can lead to fewer errors.

Then we use == if we're asking if one thing is equal to another, like this:

#Example: check if contents of x match those in y:
x <- c(1,2,3)
y <- c(1,2,3)
x==y
[1] TRUE TRUE TRUE

#Here we can't use <- or =; that would not work.