How to merge OTUs that appear in 2/3 replicates?

88 Views Asked by At

I am studying the microbiome of an organism and I have sequenced 3 PCR replicates for each sample. I am working with a phyloseq object and my sample_df looks like this (simplified):

> df
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 28974 taxa and 112 samples ]
sample_data() Sample Data:       [ 112 samples by 2 sample variables ]
tax_table()   Taxonomy Table:    [ 28961 taxa by 10 taxonomic ranks ]

>sample_data(df)
Sample Data:        [112 samples by 2 sample variables]:
                
            sample   replicate
s1_A        sample1       A
s1_B        sample1       B
s1_C        sample1       C
s2_A        sample2       A
s2_B        sample2       B
s2_C        sample2       C

I want to merge my ESVs reads, but only when they appear in two out of three replicates and I have no idea how to do it.

I know about the function merge_samples to merge all the replicates, but I would like to add the condition of 2/3 replicates

df_merged<- merge_samples(df, "sample", fun=sum) #merge samples

I have no idea how to continue, so any help would be great

thanks so much!

1

There are 1 best solutions below

3
Laura Drh On

In case someone needs it, I managed this way:

# Create an empty data frame with column names
      
      comm <- t(as.data.frame(otu_table(df)))
      merged_data <- data.frame(matrix(ncol = ncol(comm), nrow = 0))
      merged_data
      names(merged_data) <- colnames(comm)
      base_sample_names <- gsub("_.$", "", rownames(comm)) # Get the base names of the samples (without the '_A', '_B', and '_C' suffixes)      
      unique_base_samples <- unique(base_sample_names)  # Get the unique base sample names
      
      
      # For each unique base sample name:
      
      for (base_sample in unique_base_samples) {
        replicate_indices <- which(base_sample_names == base_sample)  # Get the indices of the rows corresponding to the three replicates
        replicates <- comm[replicate_indices, ]   # Extract the replicate rows
        
        
        
        # Compute the merged row
        
        merged_row <- apply(replicates, 2, function(column) { 
          
          # If the ESV is present in at least two replicates, return the sum of abundances. Can also be changed to 3 or just 1
          
          if (sum(column > 0) >= 2) {
            return(sum(column))
          } else {
            return(0)
          }
        })
        merged_row <- data.frame(t(merged_row))     #Make sure the result is a data frame with correct column names
        colnames(merged_row) <- colnames(comm)
        merged_data <- rbind(merged_data, merged_row)    # Add the merged row to the data frame
      }
      
      rownames(merged_data) <- unique_base_samples # Set the row names of the merged data
      print(merged_data) # Print the merged data
      merged_data<-t(merged_data)  #otu are rows