Rikard Forlin, MD: No financial relationships to disclose
Introduction/Rationale: Grouping individual cells in clusters and classifying these based on feature expression is a common procedure in single-cell analysis pipelines. Multiple methods have been reported for single-cell mRNA sequencing and cytometry datasets where the vast majority rely on sequential 2-step procedures involving I) cell clustering based on notions of similarity and II) cluster annotation via manual or semi-automated methods. However, as arbitrary borders are drawn between more or less similar groups of cells, one cannot guarantee that all cells within a cluster are of the same type or in the same cell state and there is no current metric for certainty or probability of its classification. Further, dimensionality reduction has been shown to cause considerable distortion in high-dimensional datasets and is prone to variable annotations of the same cell when relative changes occur in data composition.
Methods: We present an alternative method based on calculation of Wasserstein's distance and a Bayesian Directed Acyclical Graph (DAG), named BayeDAGClassifier (BDC), which annotates one cell at a time removing the need for prior clustering. Wasserstein's distance is first applied to a training dataset to find the best characteristic features (both negative or positive) of a cell annotation or state. The Bayesian DAG is then trained on these features of find the correct structure and weight of each feature in its network, classifying on cell at a time with an accompanied metric of certainty that the cell has been classified correctly.
Results: Our classifier showed improved accuracy and comes with a manageable metric of uncertainty that can be used to distinguish signal from noise in single cell data.
Conclusion: BDC is a new way to classify single cell data based on Wasserstein's distance and Bayesian DAG. It annotates on per-cell basis, avoiding high dimensional clustering and its inherent noise. Further, BDC reports back an uncertainty metric that can be used to distinguish signal from noise.