LRSK: a low-rank self-representation K-means method for clustering single-cell RNA-sequencing data†
Abstract
The development of single-cell RNA-sequencing (scRNA-seq) technologies brings tremendous opportunities for quantitative research and analyses at the cellular level. In particular, as a crucial task of scRNA-seq analysis, single cell clustering shines a light on natural groupings of cells to give new insights into the biological mechanisms and disease studies. However, it remains a challenge to identify cell clusters from lots of cell mixtures effectively and accurately. In this paper, we propose a novel adaptive joint clustering framework, named the low-rank self-representation K-means method (LRSK), to learn the data representation matrix and cluster indicator matrix jointly from scRNA-seq data. Specifically, instead of calculating the similarities among cells from the original data, we seek a low-rank representation of the original data to better reflect the underlying relationships among cells. Moreover, an Augmented Lagrangian Multiplier (ALM) based optimization algorithm is adopted to solve this problem. Experimental results on various scRNA-seq datasets and case studies demonstrate that our method performs better than other state-of-the-art single cell clustering algorithms. The analysis of unlabeled large single-cell liver cancer sequencing data further shows that our prediction results are more reasonable and interpretable.