ddply with chunks from multiple data frames

789 Views Asked by At

How can ddply (and similar functions) work with multiple data frames.

For example, I have one dataframe with information about cars in a family

car <- data.frame(name=c('aaa','aaa','bbb'), cars=c('honda','chevy','datsun'))

and a second dataframe with family members

 people <- data.frame(name=c('aaa','bbb','bbb'), age=c(25,18,33))

I would like to apply a function

 neatfun <- function( car_chunk, people_chunk){ analysis with age and type of cars}

to the corresponding chunks of car and people, something along the lines of

 analysis <- ddply( list(car,people), "name", neatfun)

where ddply would split the list of dataframes by name and then pass the corresponding chunks of each dataframe to the neatfun function.

At the moment, I'm willing to assume that every "name" appears appears in all data frames so I don't have to worry about families with cars (but no people) or with people (but no cars).

Thanks

1

There are 1 best solutions below

4
On

Without knowing exactly what you mean by 'some analysis', I can see a few ways to proceed. Start off by combining your data into a single dataframe.

library(dplyr)
> df <- left_join(car, people)
Joining by: "name"
> df
  name   cars age
1  aaa  honda  25
2  aaa  chevy  25
3  bbb datsun  18
4  bbb datsun  33

Then use dplyr operations to do analysis.

processed_df <- df %>% 
  group_by(name) %>% 
  ## do analysis here
  ## e.g. summarise, mutate, filter, top_n

If you have something a little more complex to do, and you want to have a function that say takes in a dataframe, does some stuff to it, and spits out a new dataframe, you can write the function that does it, split your data into a list, apply the function to each piece, and then either keep it in a list or if it makes sense bind them all together.

This is assuming we have the same df from above, a joined dataframe. (The merge option listed above should do the same thing as a left_join in dplyr.

df_func <- function(df){  ## take in a dataframe, do stuff, return a dataframe
  ## some analysis
  return(df_new)
}
df_list <- split(df, df$name)  ## make a list of dataframes, 
                               ## where each element is a dataframe with the same name
df_new_list <- lapply(df_list, FUN = df_func)  ## apply function to each element
total_list <- rbind_all(df_new_list) ## put all the elements into a single dataframe
                                     ## rbind_all is a dplyr tool.
                                     ## otherwise you can use
total_list <- do.call(rbind, df_new_list)