Comprehensive evaluation of human brain gene expression deconvolution methods

2020 
Gene expression measurements, similarly to DNA methylation and proteomic measurements, are influenced by the cellular composition of the sample analysed. Deconvolution of bulk transcriptome data aims to estimate the cellular composition of a sample from its gene expression data, which in turn can be used to correct for composition differences across samples. Although a multitude of deconvolution methods have been developed, it is unclear whether their performance is consistent across tissues with different complexities of cellular composition. For example, the human brain is unique in its transcriptomic diversity, and in the complexity of its cellularity, yet a comprehensive assessment of the accuracy of transcriptome deconvolution methods on human brain data is currently lacking. Here we carry out the first comprehensive comparative evaluation of the accuracy of deconvolution methods for human brain transcriptome data, and assess the tissue-specificity of our key observations by comparison with transcriptome data from human pancreas. We evaluate 22 transcriptome deconvolution approaches, covering all main classes: 3 partial deconvolution methods, each applied with 6 different categories of cell-type signature data, 2 enrichment methods and 2 complete deconvolution methods. We test the accuracy of cell type estimates using in silico mixtures of single-cell RNA-seq data, mixtures of neuronal and glial RNA, as well as nearly 2,000 human brain samples. Our results bring several important insights into the performance of transcriptome deconvolution: (a) We find that cell-type signature data has a stronger impact on brain deconvolution accuracy than the choice of method. In contrast, cell-type signature only mildly influences deconvolution of pancreas transcriptome data, highlighting the importance of tissue-specific benchmarking. (b) We demonstrate that biological factors influencing brain cell-type signature data (e.g. brain region, in vitro cell culturing), have stronger effects on the deconvolution outcome than technical factors (e.g. RNA sequencing platform). (c) We find that partial deconvolution methods outperform complete deconvolution methods on human brain data. (d) We demonstrate that the impact of cellular composition differences on differential expression analyses is tissue-specific, and more pronounced for brain than for pancreas. To facilitate wider implementation of correction for cellular composition, we develop a novel brain cell-type signature, MultiBrain, which integrates single-cell, immuno-panned, and single-nucleus datasets. We demonstrate that it achieves improved deconvolution accuracy over existing reference signatures. Deconvolution of transcriptome data from autism cases and controls using MultiBrain identified cell-type composition changes replicable across studies, and highlighted novel genes dysregulated in autism.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    89
    References
    0
    Citations
    NaN
    KQI
    []