I have a function in R that compares a smaller vector to a larger one and then finds where there are matches and uses that information to extract data from a larger data frame.
compare_masses <- function(mass_lst){
for (i in seq_along(mass_lst)) {
positions <- which(abs(AB_massLst_numeric - mass_lst[i]) < 0.02)
rows <- AB_lst[positions,]
match_df <- rbind(match_df, rows)
}
}
where mass_lst
is a list of compound masses:
ex: mass_lst <- c(315, 243, 484, 121)
AB_massLst_numeric
is the larger list of masses:
ex: AB_massLst_numeric <- c(323, 474, 812, 375, 999, 271, 676, 232)
AB_lst
is a larger data frame that I am extracting that data from with the positions vector.
match_df
is an empty data frame I do rbind
the data to.
The problem is that this function has a for loop in it and takes so long even when I use
test <- sapply(mass_lst, compare_masses)
So my question is how can I make this function faster and potentially remove the for loop? My data is much bigger in real life than the examples I provided are. I cant think a way to not iterate to make this function work.
Use vector recycling feature of R. First construct your
positions
vector of length N*m, where N is the number of rows inAB_lst
and m islength(mass_lst)
. Then select rows from your data frame using this vector.See complete runnable example below.