Problem with using Read10x function of Seurat

309 Views Asked by At

I am trying to follow the tutorial from Seurat website and try to import a dataset I find on Github to better practice the tutorial. https://satijalab.org/seurat/v3.1/pbmc3k_tutorial.html I encountered this problem. I know the ending of my file is not the right format ( it should be a tsv file instead), but the only file I can download is in csv. What should I do to approach this problem?

My code

This is the link to the data I'm trying to download: https://figshare.com/articles/dataset/SEATRAC_TB_Hackday_2023/24425053?file=42875377 and I'm trying to download darrah's data set.

I tried to convert csv to tsv but I coul't find a way.

1

There are 1 best solutions below

3
On

Seurat::Read10X expects a directory of files in the 10X format. The data you linked to looks like a .tsv file, so you should read it into R using a function meant for tabular data. The file is large, so read.table() is too slow. You can use data.table::fread()instead.

library(Seurat)
library(data.table)
dat <- fread("darrah_Week25.Filtered.cells.txt")
mat <- as.matrix(dat[,-1])
rownames(mat) <- dat$V1
mat <- as.sparse(mat)
seurat <- CreateSeuratObject(mat)

Edit: The file is large, so if memory is an issue, a range of columns can be specified. That way it may be possible to read in all the data in chunks.

genes <- fread("darrah_Week25.Filtered.cells.txt", select = 1)
dat <- fread("darrah_Week25.Filtered.cells.txt", select = 2:5000)
mat1 <- as.sparse(as.matrix(dat))
dat <- fread("darrah_Week25.Filtered.cells.txt", select = 5001:10000)
mat2 <- as.sparse(as.matrix(dat))

Choose suitable column ranges and repeat as needed. It may help to run garbage collection manually with gc() to free up memory. Finally, cbind the subsets:

mat <- cbind(mat1, mat2)
rownames(mat) <- genes$V1