I'm trying to run a numerical simulation across a range of points from a data set created with expand grid. I'd like to use plyr
or dplyr
for this if possible. However, I don't understand the syntax.
Is there a small perturbation on the code below that applies the values of x and y individually against f?
f <- function(x, y) {
A <- data_frame(a = x*runif(100) - y)
B <- data_frame(b = A$a - rnorm(100)*y)
sum(A$a) - sum(B$b)
}
X <- expand.grid(x = 1:10, y = 2:8)
X %>% mutate(z = f(x, y))
I had hoped ddply
might make this easier.
EDIT: This seems to behave as intended:
X %>% ddply(.(x, y), transform, z = f(x, y))
Let's rewrite your function to do the same thing without the
data_frame
calls, just using vectors will be faster:Since you want to apply this to every row, you could do it with
plyr
ordplyr
. These tools are made for "split-apply-combine", where you you split a data frame into pieces by some grouper, do something to each piece, and put it back together. You want to something to every individual row, so we set bothx
andy
as grouping variables, which works because a combination of x and y uniquely defines a row:For both
plyr
anddplyr
, themutate
function is used because you want to add a column to an existing data frame, keeping the same number of rows. The other common function to use issummarize
, which is used when you want to condense groups that have multiple rows into a single summary row.mutate
is very similar tobase::transform
.There is really no advantage to using
plyr
for data frame manipulation,dplyr
is faster and most people think easier to understand. It really shines when you have more complex manipulations and are using groups rather than individual rows. For individual rows, the base functionmapply
works well:(thanks to @jeremycg in the comments). You can use
dplyr
but there's no reason to do so in this case.