How do I write code to handle dealing with averaging replicates with NA values in R?

Question

How do I write code to handle dealing with averaging replicates with NA values in R?

46 Views Asked by Anna Kasper At 07 December 2023 at 18:05

I am currently working on a scientific project where each sample has two or three replicates.

The data (we'll call the dataframe 'raw') looks something along the lines of:

| SampleID | Var1     | Var2     | Var3     |...
| Unknown 1 |    NA    |    12    |    NA    |
| Unknown 1 |    NA    |    19    |    23    |
| Unknown 2 |    NA    |    NA    |    50    |
| Unknown 2 |    18.5  |    NA    |    32    |
| Unknown 2 |    8.12  |    NA    |    20    |
...

Where unknown 1 has two replicates and unknown 2 has three replicates. The NAs are for when values could not be obtained since the signal was too strong/too low.

I want to create a new dataframe called 'averages' that contains the averages of the replicates for each group (Sample ID) over all the Var# columns so that it looks more like this:

Averages of each sample over different variables

| SampleID  | Var1     | Var2     | Var3     |...
| Unknown 1 |    NA    |   15.5   |    23    |
| Unknown 2 |  26.62   |    NA    |    51    |
...

What I am struggling with is handling the NAs the way I want to when calculating the averages. In the new averages dataframe, I still want it to show:

'NA' for samples where ALL replicates were 'NA'. If I use any function that omits NAs when calculating the means, any sample with replicates that contain ALL NAs in at least one variable are not added to the new averages dataframe, which is not what I want. I still want to see all the samples listed in the new dataframe, just with their replicate/triplicate values combined into averages.
In the case where (1 of 2) or (2 of 3) replicates contained NA values, I want those NA values to be omitted. For instance, in the table above, 'Unknown 2' has triplicate values and in 'Var1', there is one triplicate value that is NA whereas the other triplicate values are "18.5" and "8.12". Thus, I want the NA value to be omitted and the mean calculation to instead look like (18.5 + 8.12)/2 even though there is supposed to be 3 replicate values.

#Approach 1 - worked only for one column at a time, omits the samples whose replicates ALL have NA values averages <- aggregate(raw$'Var1', list(raw$SampleID), FUN=mean, na.rm=TRUE)

#Approach 2 - did not not omit the NAs in mean calculation averages <- aggregate(.~SampleID, raw, mean, na.rm=TRUE)

#Approach 3 - #tried putting the colnames into a list, then calling the list in the following code. Same issue with samples being omitted if they have replicates with NA values.

avgcolnames <- colnames(raw[,sapply(results,is.numeric)])
averages <- aggregate(cbind(avgcolnames) ~ SampleID, data = raw, 
                        FUN = mean, na.rm = TRUE)

#Approach 4 - # worked on only on one column at a time and have to specify new column names

averages <- raw %>% 
  group_by(SampleID) %>%
  summarize(Var1 = mean(Var1, na.rm = TRUE))

Original Q&A

There are 1 best solutions below

**merv** · Answer 1 · 2023-12-07T18:15:24.210000

Approach #4 is along the way, next step is to make it generic, which we can do with tidy-select functionality and an anonymous function:

library(tidyverse)

iris %>%
  group_by(Species) %>%
  summarize(across(where(is.numeric), ~ mean(.x, na.rm=TRUE)))

## # A tibble: 3 × 5
##   Species    Sepal.Length Sepal.Width Petal.Length Petal.Width
##   <fct>             <dbl>       <dbl>        <dbl>       <dbl>
## 1 setosa             5.01        3.43         1.46       0.246
## 2 versicolor         5.94        2.77         4.26       1.33 
## 3 virginica          6.59        2.97         5.55       2.03

Here we select all numeric columns with across(where(is.numeric), <fn>) and the ~ mean(.x, na.rm=TRUE) is an anonymous function where .x is the argument and computes the mean with NAs excluded.

How do I write code to handle dealing with averaging replicates with NA values in R?

There are 1 best solutions below

Related Questions in R

Related Questions in GROUP-BY

Related Questions in AVERAGE

Related Questions in NA

Related Questions in REPLICATE

Trending Questions

Popular # Hahtags

Popular Questions