How to Write a Function That Will Loop Through Dataframes of Differing Lengths

42 Views Asked by At

I have two different dataframes of different lengths, each with two value columns. What I want to do is find the mean and sum of each row of the two value columns in each dataframe by writing a function that will calculate both desired columns for each dataframe.

Here's the code to recreate the dataframes:

library(tidyverse)


#Creating dataframes
day1<-c(1,2,3,4,5)
day2<-c(1,2,3,4)
value11<-c(6,7,8,9,10)
value12<-c(11,12,13,14,15)
value21<-c(2,4,6,8)
value22<-c(1,3,5,7)

df1<-data.frame(day1,value11,value12)
df2<-data.frame(day2,value21,value22)
dfs<-list(df1,df2)
names(dfs)<-c("df1","df2")

And here's my current loop which is able to calculate the average and sum columns for a single dataframe:

#creating the new mean and sum columns
for (i in 1:dim(df1)[1]) {
df1$meanval[i] <- mean(df1$value11[i],df1$value12[i])
df1$sumval[i] <- sum(df1$value11[i],df1$value12[i])

What I'd like to do now is find a way of applying that loop to both dataframes simultaneously. Here's what I was trying to use:

#creating the new mean and sum columns
SumAndMean<-function(x){
for (i in 1:dim(dfs)[[i]][1]) {
x$meanval[i] <- mean(x[[2]][i],x[[3]][i])
x$sumval[i] <- sum(x[[2]][i],x[[3]][i])
}
}

#Applying function to list of dataframes
lapply(seq_along(dfs), function(i) SumAndMean(dfs[i]))

So far this results in an error. I'm not sure, but I think it may have something to do with the fact that I'm using i to refer to both the subsections of the dfs list and the subsections of both df1 and df2. I'm not entirely sure how to rewrite my function to get around this. Any ideas? Thanks!

1

There are 1 best solutions below

1
Mark On

One option, using map:

map(dfs, ~ mutate(., sum_row = rowSums(across(starts_with("value"))),
           mean_row = rowMeans(across(starts_with("value")))))

[[1]]
  day1 value11 value12 sum_row mean_row
1    1       6      11      17      8.5
2    2       7      12      19      9.5
3    3       8      13      21     10.5
4    4       9      14      23     11.5
5    5      10      15      25     12.5

[[2]]
  day2 value21 value22 sum_row mean_row
1    1       2       1       3      1.5
2    2       4       3       7      3.5
3    3       6       5      11      5.5
4    4       8       7      15      7.5

The problem with the idea for the code that you have, is that (it sounds like) you are wanting to iterate over two different dataframes simultaneously, but they have a different number of rows, so when you get to the last row of the longest dataframe, the code will return an error on the shortest dataframe, because it can't find a row there.