I'm trying to run a numerical simulation across a range of points from a data set created with expand grid. I'd like to use plyr or dplyr for this if possible. However, I don't understand the syntax.

Is there a small perturbation on the code below that applies the values of x and y individually against f?

f <- function(x, y) {
    A <- data_frame(a = x*runif(100) - y)
    B <- data_frame(b = A$a - rnorm(100)*y)
    sum(A$a) - sum(B$b)
}

X <- expand.grid(x = 1:10, y = 2:8)
X %>% mutate(z = f(x, y))

I had hoped ddply might make this easier.

EDIT: This seems to behave as intended:

 X %>% ddply(.(x, y), transform, z = f(x, y))
1

There are 1 best solutions below

2
On BEST ANSWER

Let's rewrite your function to do the same thing without the data_frame calls, just using vectors will be faster:

f <- function(x, y) {
    a = x * runif(100) - y
    b = a - rnorm(100) * y
    sum(a) - sum(b)
}

Since you want to apply this to every row, you could do it with plyr or dplyr. These tools are made for "split-apply-combine", where you you split a data frame into pieces by some grouper, do something to each piece, and put it back together. You want to something to every individual row, so we set both x and y as grouping variables, which works because a combination of x and y uniquely defines a row:

# plyr
ddply(X, .(x, y), plyr::mutate, z = f(x, y))

# dplyr
group_by(X, x, y) %>% dplyr::mutate(z = f(x, y))

For both plyr and dplyr, the mutate function is used because you want to add a column to an existing data frame, keeping the same number of rows. The other common function to use is summarize, which is used when you want to condense groups that have multiple rows into a single summary row. mutate is very similar to base::transform.

There is really no advantage to using plyr for data frame manipulation, dplyr is faster and most people think easier to understand. It really shines when you have more complex manipulations and are using groups rather than individual rows. For individual rows, the base function mapply works well:

X$z = mapply(f, X$x, X$y)

(thanks to @jeremycg in the comments). You can use dplyr but there's no reason to do so in this case.