spliced and unspliced sequence alignment using STAR

196 Views Asked by At

I am working with single cell sequencing data, and want to run this through RNA velocity (https://www.nature.com/articles/s41586-018-0414-6). For that, I need to map both spliced and unspliced reads. The dataset I am working with is a SMARTseq dataset, without UMIs or barcodes (https://www.embopress.org/doi/full/10.15252/embj.2018100164). Data are provided as a single sra file per cell, which can be fetched and unpacked into a single-ended fastq file through GEO using the SRAtoolkit.

What I aimed to do was map the data twice, using STAR alignment. First I wanted to map the reads to exonic regions, then again to intronic regions. To do this, I downloaded the fasta and gtf file from the ensembl reference genome GRCm38 build 100 (https://www.ensembl.org/Mus_musculus/Info/Index). The gtf file does not by itself contain intronic info, which I added in R using the following code:

library(gread) # https://rdrr.io/github/asrinivasan-oa/gread/
library(rtracklayer)

dir <- "/to/reference/genome"
gtf_file <- file.path(dir, "GRCm38build100.gtf")
gtf <- read_format(gtf_file)
gtf.new <- construct_introns(gtf, update = TRUE)[]
export(gtf.new, "GRCm38build100withIntrons.gtf", format = "gtf")

Then, I generated a STAR genome as follows:

STAR \
--runMode genomeGenerate \
--runThreadN 8 \
--sjdbOverhang 50 \
--genomeDir GRCm38build100/50bp/ \
--genomeFastaFiles GRCm38build100/GRCm38build100.fa \
--sjdbGTFfile GRCm38build100/GRCm38build100withIntrons.gtf \

The code for mapping reads against exonic sequences was as follows:

STAR \
--runMode alignReads \
--runThreadN 8 \
--genomeDir GRCm38build100/50bp \
--readFilesIn $R1 \
--outSAMtype None \
--twopassMode Basic \
--sjdbGTFfeatureExon exon \
--quantMode GeneCounts \
--outFileNamePrefix output/${SAMPLE}_exon_

This worked just fine. For unspliced reads, I wanted to do the following:

STAR \
--runMode alignReads \
--runThreadN 8 \
--genomeDir GRCm38build100/50bp \
--readFilesIn $R1 \
--outSAMtype None \
--twopassMode Basic \
--sjdbGTFfeatureExon intron \
--quantMode GeneCounts \
--outFileNamePrefix output/${SAMPLE}_intron_

Strangely, this gave me the exact same result as mapping against exonic reads. I'm confused, I'm sure I'm doing something wrong, but I cannot figure out what. Any help would be much appreciated!

Best regards, Leon

0

There are 0 best solutions below