Using autoencoders and text mining to characterize single cell populations in the hippocampus and cortex

2018 
Revolutionary advances in genomic technology have allowed researchers to address biological questions about cell types, states, and gene regulation at the scale of single cells. However, the ability to characterize gene expression and function of individual cells brings with it new data-related challenges, such as dimensionality, feature reduction, and noise reduction. The central objective of this research was to use existing methods in a novel application of single-cell gene expression data to better characterize sub-populations of cell types in various regions of the brain. This research approach used a computational bioinformatics pipeline for single-cell RNA-sequencing (RNA-seq) normalization and clustering. Data science methodologies, such as autoencoding and text mining, were adapted to identify candidate gene sets that distinguish different types of cells in the central nervous system. Then, the functional themes of these gene sets were inferred using a combination of functional enrichment of gene ontology terms and topic modeling. Topic modeling revealed various functional themes among the clusters, in some cases reinforcing the results of biomarker analysis, and in other cases providing further insight into potential functional differences between clusters. For one cluster in the cortex, an immune theme emerged with stemmed-words as specific as “immun” and “antigen” appearing in the results. In the hippocampus, clusters determined to be neurons could be further differentiated as themes related to various organs were identified. One of these clusters featured a vascular theme with words related to “endotheli.” Future applications of these methods intend to expound upon specific cellular processes, in relation to immune function, and translational research on neurological disease states.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []