AdaTiSS: A Novel Data-Adaptive Robust Method for Identifying Tissue Specificity Scores.

2021 
MOTIVATION Accurately detecting tissue specificity (TS) in genes helps researchers understand tissue functions at the molecular level. The Genotype-Tissue Expression project is one of the publicly available data resources, providing large-scale gene expressions across multiple tissue types. Multiple tissue comparisons and heterogeneous tissue expression make it challenging to accurately identify tissue specific gene expression. How to distinguish the inlier expression from the outlier expression becomes important to build the population level information and further quantify the TS. There still lacks a robust and data-adaptive TS method taking into account heterogeneities of the data. METHODS We found that the key to identify tissue specific gene expression is to properly define a concept of expression population. In a linear regression problem, we developed a novel data-adaptive robust estimation based on density-power-weight under unknown outlier distribution and non-vanishing outlier proportion. The Gaussian-population mixture model was considered in the setting of identifying TS. We took into account heterogeneities of gene expression and applied the robust data-adaptive procedure to estimate the population parameters. With the well-estimated population parameters, we constructed the AdaTiSS algorithm. RESULTS Our AdaTiSS profiled TS for each gene and each tissue, which standardized the gene expression in terms of TS. We provided a new robust and powerful tool to the literature of defining tissue specificity. AVAILABILITY https://github.com/mwgrassgreen/AdaTiSS.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    1
    Citations
    NaN
    KQI
    []