I have PCR-Amplified fastq files of a specific target region from several samples. For each sample, I want to know the percentage of reads that align better to reference sequence #1 or #2 posted below. How should I begin to tackle this question and what tool for alignment is best?
I am working with Illumina paired-end adapter sequences spiked-in on a 2X150 run. The two reference amplicons are 173 and 179 bp:
1: aaaaagtataaatataggaccaggcagagcattttatacaacaggagaaataataggagatataagacaagcacattgtaaccttagtagagcaaaatggaatgacactttaaataagatagttataaaattaagagaacaatttgggaataaaacaatagtctttaagcact
2: aaaaagtatccgtatccagaggggaccagggagagcatttgttacaataggaaaaataggaaatatgagacaagcacattgtaacattagtagagcaaaatggaatgccactttaaaacagatagctagcaaattaagagaacaatttggaaataataaaacaataatctttaagcaat
We want to know if one virus wins over another after infection infection based off of the differences between these two sequences; so essentially the percentage that align best to #1 and the percentage that align best to #2.
Thank You,
Sara
fasta
format.bwa mem
,bowtie2
, etc.samtools idxstats
to find the number of reads aligned to each of the amplicons.Notes:
flexbar
,skewer
, etc.conda
.REFERENCES:
conda
bwa
bowtie2
samtools
flexbar
skewer