R How to visualize pairwise alignment

4.5k Views Asked by At

How to visualize the complete alignment of two sequences?

library(Biostrings)
s1 <-DNAString("ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCAAGAAGACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCAAG")
s2 <-DNAString("GTTTCACTACTTCCTTTCGGGTAAGTAAATATATGTTTCACTACTTCCTTTCGGGTAAGTGTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATCAAATATATAAATATATAAAAATATAATTTTCATCAAATATATAAAAATATAATTTTCATC")
pairwiseAlignment(s1,s2)

Output:

Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGT--TTTCAC---...CTTCACCAGCTCCCTGGCGGTAAGTTG-ATCAAAGG---AAACGCAAAGTTTTCAAG 
subject: [1] GTTTCACTACTTCCTTTCGGGTAAGTAAAT-ATATGTTTCACTACTTCCTTTCGGGTA...TATATAAATATATAAAAATATAATTTTCATCAAATATATAAAAATATAATTTTCATC 
score: -394.7115 

Here, only a part of alignment has been shown? Do you know of any existing functions that either plot or print the alignment?

1

There are 1 best solutions below

5
On

You can find information and details on how to extract the aligned pattern and subject sequences under ?pairwiseAlignments.

Here is an example based on the sample data you provide:

  1. Store the pairwise alignment in a PairwiseAlignmentsSingleSubject object

    alg <- pairwiseAlignment(s1,s2)
    
  2. Extract the aligned pattern and subject sequences and merge them into a DNAStringSet object.

    seq <- c(alignedPattern(alg), alignedSubject(alg))
    
  3. You can access the full sequences with as.character

    as.character(seq)
    [1] "ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGT--TTTCAC--------TTCACCAGCTCCCTGGCGGTAAGTTGATC---AAAGG---AAACGCAAAGTTTTCAAGAAGACTTCACCAGCTCCCTGGCGGTAAGTTG-ATCAAAGG---AAACGCAAAGTTTTCAAG"
    [2] "GTTTCACTACTTCCTTTCGGGTAAGTAAAT-ATATGTTTCACTACTTCCTTTCGGGTAAGTGTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATCAA-ATATATAAATATATAAAAATATAATTTTCATCAAATATATAAAAATATAATTTTCATC"
    

    It seems that alignedPattern and alignedSubject were added to Biostrings very recently. Alternatively you can do

    seq <- c(aligned(pattern(alg)), aligned(subject(alg)))
    

    but note that this will trim globally aligned sequences (see details).

  4. There exists a nice R/Bioconductor package DECIPHER which offers a method to visualise XStringSet data in a web browser. It automatically adds colour-coding and a consensus sequence at the bottom. In your case, you would do

    library(DECIPHER)
    BrowseSeqs(seq)
    

    enter image description here