How do you retrieve individual DNA sequences after importing an alignment into R?

829 Views Asked by At

I imported an alignment in FASTA format into R

    read.dna(file.choose(),format="fasta",skip=0)

My alignment looks something like this

Seq1 ATGCGGGAATGGACTCATGCATCG
Seq2 ATTCGATCTTGCTAGCTAGCTCGT
Seq3 ATATCGATGTCGATCGATCGACGA

If I want to call individual sequences from within this alignment (say Seq2 for example), what do I need to do ?

2

There are 2 best solutions below

0
On

I don't know where read.dna() comes from (there are >6000 CRAN packages, and almost 1000 Bioconductor packages). You could use the Biostrings package and

library(Biostrings)
dna = readDNAStringSet("path/to.fasta")

and do many useful things, including those described in the quick reference. If at the end you want a single character vector, then

as.character(dna[1])

or

as.character(dna[names(dna) == "Seq3"])
0
On

I am guessing that you are using ape package. Using the example in ?read.dna

library(ape)
cat(">No305",
"NTTCGAAAAACACACCCACTACTAAAANTTATCAGTCACT",
">No304",
"ATTCGAAAAACACACCCACTACTAAAAATTATCAACCACT",
">No306",
"ATTCGAAAAACACACCCACTACTAAAAATTATCAATCACT",
file = "exdna.txt", sep = "\n")
ex.dna4 <- read.dna("exdna.txt", format = "fasta")

ex.dna4[dimnames(ex.dna4)[[1]]=='No304',]
#1 DNA sequences in binary format stored in a matrix.

#All sequences of same length: 40 

#Labels: No304 

#Base composition:
#    a     c     g     t 
#0.475 0.300 0.025 0.200 

as.character(ex.dna4[dimnames(ex.dna4)[[1]]=='No304'])
#[1] "a" "t" "t" "c" "g" "a" "a" "a" "a" "a" "c" "a" "c" "a" "c" "c" "c" "a" "c"
#[20] "t" "a" "c" "t" "a" "a" "a" "a" "a" "t" "t" "a" "t" "c" "a" "a" "c" "c" "a"
#[39] "c" "t"