I downloaded a raw data set from GSE (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE92332) which contains single cell analysis data. There are three different file formats matrix.mtx.gz, barcodes.tsv.gz and genes.tsv.tz
I now tried to run this code in order to load the data:
#Load data
data_file = "/Users/---/desktop/single-cell-tutorial/latest_notebook/GSE92332_RAW"
adata = sc.read(data_file, cache=True)
adata = adata.transpose()
adata.X = adata.X.toarray()
But I always get the following value error
ValueError: Reading with filekey '/Users/---/desktop/single-cell-tutorial/latest_notebook/GSE92332_RAW/MTX/mtx.gv' failed, the inferred filename PosixPath('/Users/---/desktop/single-cell-tutorial/latest_notebook/GSE92332_RAW/MTX/mtx.gv.h5ad') does not exist. If you intended to provide a filename, either use a filename ending on one of the available extensions {'csv', 'data', 'tab', 'h5ad', 'anndata', 'h5', 'tsv', 'xlsx', 'loom', 'txt', 'mtx.gz', 'soft.gz', 'mtx'} or pass the parameter
ext.
I understand that I need to add an extension but regardless of whichever extension I add I still get the same error.
I tried all different extensions that are also file types (mtx.gz etc.), made an own folder with only the MTX data and tried calling that but nothing is working.
The
scanpy.readmethod is for.h5adfiles. If loading raw CellRanger MTX, then you should use thescanpy.read_10x_mtxmethod. E.g.,As commented, the .mtx and .tsv files likely need to be unzipped (run
gzip -d *.gzfrom command line while in the folder). This is idiosyncratic toscanpy, which requires data withgenes.tsv(pre-v3 CellRanger output) to be unzipped, whereas data withfeatures.tsv(v3+ CellRanger output) can stay zipped. At least that's what the code shows.Since this appears to be many runs, you may also need the
prefixargument to specify which particular run you want to load.