Scanpy - single-cell RNA sequencing - pre-processing of individual Vs. integrated dataset

160 Views Asked by Ofir Shorer At 27 September 2023 at 08:47

I'm using Scanpy in order to analyze an integrated single-cell data comprised of two different datasets.

In the default preprocessing stage provided by Scanpy's authors, cells are being filtered based on their mitochondrial genes expression etc.

Following, effects of total counts and mitochondrial genes are being regressed-out using sc.pp.regress_out and the genes are then scaled to a unit variance using sc.pp.scale.

Should these preprocessing steps be implemented on each dataset separately prior to integration? Or should these steps be committed following integration? - specifically regressing-out and scaling.

For example, conducting sc.pp.scale prior to integration will cause the genes in both datasets to have similar distributions, thus remove possible differences between datasets after integrating them. So it seems as if this step should be conducted following integration.

However, as each dataset originally has a different number of genes sequenced, applying sc.pp.regress_out following integration seems like a mistake as the total counts are affected by the total number of genes sequenced in each dataset. So it seems as if this step should be conducted prior to integration.

Original Q&A

Scanpy - single-cell RNA sequencing - pre-processing of individual Vs. integrated dataset

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in BIOINFORMATICS

Related Questions in DATA-PREPROCESSING

Related Questions in SCANPY

Trending Questions

Popular # Hahtags

Popular Questions