Treffer: scTPC: a novel semisupervised deep clustering model for scRNA-seq data.
Nature. 1999 Oct 21;401(6755):788-91. (PMID: 10548103)
Nat Commun. 2022 Dec 13;13(1):7705. (PMID: 36513636)
Brief Bioinform. 2021 Sep 2;22(5):. (PMID: 33535230)
IEEE Trans Pattern Anal Mach Intell. 2004 Jan;26(1):131-7. (PMID: 15382693)
Bioinformatics. 2021 May 5;37(6):775-784. (PMID: 33098418)
Science. 2017 Aug 18;357(6352):661-667. (PMID: 28818938)
IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):566-575. (PMID: 35316190)
Bioinformatics. 2022 Mar 4;38(6):1575-1583. (PMID: 34999761)
Nature. 2018 Oct;562(7727):367-372. (PMID: 30283141)
Nat Commun. 2017 Jan 16;8:14049. (PMID: 28091601)
Cell. 2015 May 21;161(5):1187-1201. (PMID: 26000487)
Nat Methods. 2009 May;6(5):377-82. (PMID: 19349980)
Nat Commun. 2019 Jan 23;10(1):390. (PMID: 30674886)
Methods. 2022 Dec;208:66-74. (PMID: 36377123)
Nat Methods. 2017 May;14(5):483-486. (PMID: 28346451)
BMC Bioinformatics. 2009 Mar 30;10:99. (PMID: 19331680)
Nat Rev Genet. 2019 Sep;20(9):536-548. (PMID: 31114032)
NAR Genom Bioinform. 2020 May 25;2(2):lqaa039. (PMID: 33575592)
Brief Bioinform. 2022 Mar 10;23(2):. (PMID: 35043143)
Mol Aspects Med. 2018 Feb;59:36-46. (PMID: 28754496)
Nat Biotechnol. 2018 Jun;36(5):411-420. (PMID: 29608179)
Nature. 2018 Aug;560(7718):377-381. (PMID: 30069046)
Nat Commun. 2021 Mar 25;12(1):1873. (PMID: 33767149)
Proteomics. 2018 Jan;18(2):. (PMID: 29265724)
Nat Methods. 2017 Apr;14(4):414-416. (PMID: 28263960)
BMC Bioinformatics. 2023 May 26;24(1):217. (PMID: 37237310)
Cell. 2018 Feb 22;172(5):1091-1107.e17. (PMID: 29474909)
Neuron. 2019 Jun 19;102(6):1111-1126.e5. (PMID: 31128945)
Proc Natl Acad Sci U S A. 2018 Mar 6;115(10):2407-2412. (PMID: 29463737)
Nat Methods. 2019 Oct;16(10):983-986. (PMID: 31501545)
Cell Rep. 2017 Mar 28;18(13):3227-3241. (PMID: 28355573)
Bioinformatics. 2020 Jun 1;36(12):3825-3832. (PMID: 32246821)
Science. 2018 Aug 10;361(6402):594-599. (PMID: 30093597)
Weitere Informationen
Motivation: Continuous advancements in single-cell RNA sequencing (scRNA-seq) technology have enabled researchers to further explore the study of cell heterogeneity, trajectory inference, identification of rare cell types, and neurology. Accurate scRNA-seq data clustering is crucial in single-cell sequencing data analysis. However, the high dimensionality, sparsity, and presence of "false" zero values in the data can pose challenges to clustering. Furthermore, current unsupervised clustering algorithms have not effectively leveraged prior biological knowledge, making cell clustering even more challenging.
Results: This study investigates a semisupervised clustering model called scTPC, which integrates the triplet constraint, pairwise constraint, and cross-entropy constraint based on deep learning. Specifically, the model begins by pretraining a denoising autoencoder based on a zero-inflated negative binomial distribution. Deep clustering is then performed in the learned latent feature space using triplet constraints and pairwise constraints generated from partial labeled cells. Finally, to address imbalanced cell-type datasets, a weighted cross-entropy loss is introduced to optimize the model. A series of experimental results on 10 real scRNA-seq datasets and five simulated datasets demonstrate that scTPC achieves accurate clustering with a well-designed framework.
Availability and Implementation: scTPC is a Python-based algorithm, and the code is available from https://github.com/LF-Yang/Code or https://zenodo.org/records/10951780.
(© The Author(s) 2024. Published by Oxford University Press.)