phyloseq to csv, with selected rows and columns?

99 Views Asked by At

For input to the r package SRS, I need a csv file with each column being a sample, and the rows being the taxa. I've been able to make similar outputs using psmelt but it contains all the metadata and the taxa are not in rows.

glom <- tax_glom(ps, taxrank='genus')

otus <- tax_table(glom)

dataexport <- psmelt(glom)

Another problem in those exports is that each domain, phylum, etc has a column with a value, and SRS wants domain;phylum;etc in one column as taxa names.

Does anyone know how to pick and choose which items you want as rows and columns from phyloseq object? Or how to meld all the taxa pieces into one string?

Thank you very much.

ps

otu_table()   OTU Table:         [ 4282 taxa and 172 samples ]
sample_data() Sample Data:       [ 172 samples by 5 sample variables ]
tax_table()   Taxonomy Table:    [ 4282 taxa by 7 taxonomic ranks ]
refseq()      DNAStringSet:      [ 4282 reference sequences ]```
1

There are 1 best solutions below

2
jared_mamrot On

From the docs, SRS input:

Data frame (species count or OTU table) in which columns are samples and rows are the counts of species or OTUs. Only integers are accepted as data.

My guess is to use abundances(ps) to get the required format, e.g. using the example "dietswap" dataset:

# BiocManager::install("microbiome")
library(microbiome)
#> 
#>  Copyright (C) 2011-2022 Leo Lahti, 
#>     Sudarshan Shetty et al.
#> 
#> Attaching package: 'microbiome'
#> The following object is masked from 'package:ggplot2':
#> 
#>     alpha
#> The following object is masked from 'package:base':
#> 
#>     transform

data(dietswap)
ps <- dietswap
df <- as.data.frame(abundances(ps))
df
#>                                       Sample-1 Sample-2 Sample-3 Sample-4
#> Actinomycetaceae                             0        1        0        1
#> Aerococcus                                   0        0        0        0
#> Aeromonas                                    0        0        0        0
#> Akkermansia                                 18       97       67      256
#> Alcaligenes faecalis et rel.                 1        2        3        2
#> Allistipes et rel.                         336       63       36       96
#> Anaerobiospirillum                           0        0        0        0
#> Anaerofustis                                 0        1        0        0
#> Anaerostipes caccae et rel.                244      137       27       36
#> Anaerotruncus colihominis et rel.           12      108      203       68
#> Anaerovorax odorimutans et rel.              6       73       30       60
#> ...
#> Xanthomonadaceae                             1        1        1        3
#> Yersinia et rel.                             2        2        3        5

write.csv(df, "ps_abundances.csv")

Created on 2024-01-18 with reprex v2.1.0


In terms of the taxonomic labels, perhaps:

tax_table(ps)
#> Taxonomy Table:     [130 taxa by 3 taxonomic ranks]:
#>                              Phylum            Family           
#> Actinomycetaceae             "Actinobacteria"  "Actinobacteria" 
#> Aerococcus                   "Firmicutes"      "Bacilli"        
#> Aeromonas                    "Proteobacteria"  "Proteobacteria" 
#> Akkermansia                  "Verrucomicrobia" "Verrucomicrobia"
#> Alcaligenes faecalis et rel. "Proteobacteria"  "Proteobacteria" 
#> Allistipes et rel.           "Bacteroidetes"   "Bacteroidetes"  
#>                              Genus                         
#> Actinomycetaceae             "Actinomycetaceae"            
#> Aerococcus                   "Aerococcus"                  
#> Aeromonas                    "Aeromonas"                   
#> Akkermansia                  "Akkermansia"                 
#> Alcaligenes faecalis et rel. "Alcaligenes faecalis et rel."
#> Allistipes et rel.           "Allistipes et rel."
#> ...

# in a single column?
library(tidyverse)
as.data.frame(tax_table(ps)) %>%
  pivot_longer(everything())
#> # A tibble: 390 × 2
#>    name   value           
#>    <chr>  <chr>           
#>  1 Phylum Actinobacteria  
#>  2 Family Actinobacteria  
#>  3 Genus  Actinomycetaceae
#>  4 Phylum Firmicutes      
#>  5 Family Bacilli         
#>  6 Genus  Aerococcus      
#>  7 Phylum Proteobacteria  
#>  8 Family Proteobacteria  
#>  9 Genus  Aeromonas       
#> 10 Phylum Verrucomicrobia 
#> # ℹ 380 more rows

Created on 2024-01-18 with reprex v2.1.0

Does that solve your problem?