Treffer: jsPCA: fast, scalable, and interpretable identification of spatial domains and variable genes across multi-slice and multi-sample spatial transcriptomics data

Title:
jsPCA: fast, scalable, and interpretable identification of spatial domains and variable genes across multi-slice and multi-sample spatial transcriptomics data
Contributors:
Marseille medical genetics - Centre de génétique médicale de Marseille (MMG), Aix Marseille Université (AMU)-Institut National de la Santé et de la Recherche Médicale (INSERM), Institut de Mathématiques de Toulouse UMR5219 (IMT), Université Toulouse Capitole (UT Capitole), Communauté d'universités et établissements de Toulouse (Comue de Toulouse)-Communauté d'universités et établissements de Toulouse (Comue de Toulouse)-Institut National des Sciences Appliquées - Toulouse (INSA Toulouse), Institut National des Sciences Appliquées (INSA)-Communauté d'universités et établissements de Toulouse (Comue de Toulouse)-Institut National des Sciences Appliquées (INSA)-Communauté d'universités et établissements de Toulouse (Comue de Toulouse)-Université Toulouse - Jean Jaurès (UT2J), Communauté d'universités et établissements de Toulouse (Comue de Toulouse)-Centre National de la Recherche Scientifique (CNRS)-Université de Toulouse (EPE UT), Communauté d'universités et établissements de Toulouse (Comue de Toulouse), Core Cluster of the Institut Français de Bioinformatique (IFB), ANR-CPJ12022-VILLOUTREIX-U125
Publisher Information:
CCSD, 2025.
Publication Year:
2025
Collection:
collection:UNIV-TLSE2
collection:CNRS
collection:UNIV-AMU
collection:INSA-TOULOUSE
collection:IMT
collection:UT1-CAPITOLE
collection:MMG
collection:INSA-GROUPE
collection:UNIV-UT3
collection:UT3-TOULOUSEINP
Original Identifier:
BIORXIV: 2025.09.16.676466
HAL: hal-05391223
Document Type:
E-Ressource preprint<br />Preprints<br />Working Papers
Language:
English
Relation:
info:eu-repo/semantics/altIdentifier/doi/10.1101/2025.09.16.676466
DOI:
10.1101/2025.09.16.676466
Rights:
info:eu-repo/semantics/OpenAccess
URL: http://creativecommons.org/licenses/by/
Accession Number:
edshal.hal.05391223v1
Database:
HAL

Weitere Informationen

Spatially structured cell heterogeneity within tissues is essential for healthy organ function. This heterogeneity is reflected by differential gene expression activity at various spatial location. Spatial transcriptomics technologies record genome-wide measurements of gene expression at the scale of entire tissues with high spatial resolution. While they have revolutionized our quantitative understanding of tissue architecture, these technologies generate large and high dimensional datasets encompassing tens of thousands of genes recorded at tens of thousands of spatial locations, requiring efficient automated methods for their analysis. In this study we introduce joint spatial PCA (jsPCA), a novel, fast, scalable and interpretable method for the automatic identification of spatial domains and variable genes in multi-slice and multi-sample spatial transcriptomics data. jsPCA relies on a simple mathematical formulation of a spatial covariance defined as the product of the gene expression covariance with the spatial autocorrelation.The principal components of this spatial covariance yield a biologically meaningful low-dimensional representation. From this representation, we can derive spatial domains by simple clustering. In addition, spatially variable genes can be identified directly from the principal components coefficients. Moreover, this approach enables the joint representation of multiple slices and samples, a frequent experimental setting. This joint representation is obtained without spatial alignment by computing common principal components via joint diagonalization of the set of spatial covariance matrices obtained for each slice. By leveraging sparsity and non-convex optimization on manifold, jsPCA leads to computing time in the order of seconds to minutes, substantially outperforming state-of-the-art approaches. We benchmarked jsPCA on the Visium 10x dataset of human dorsolateral prefrontal cortex and the Stereo-seq MOSTA dataset of mouse embryonic development against 10 state-of-the-art methods. Our approach demonstrated excellent performances, comparable or better than state-of-the-art methods, such as SpatialPCA, BASS, GraphPCA or Stagate, while being much faster, interpretable, and scalable to very large datasets.