IMIX: A multivariate mixture model approach to integrative analysis of multiple types of omics data

2020 
MotivationIntegrative genomic analysis is a powerful tool to study the biological mechanisms underlying a complex disease or trait across multiplatform high-dimensional data, such as DNA methylation, copy number variation (CNV), and gene expression. It is common to perform large-scale genome-wide association analysis of an outcome for each data type separately and combine the results ad hoc, leading to loss of statistical power and uncontrolled overall false discovery rate (FDR). ResultsWe propose a multivariate mixture model framework (IMIX) that integrates multiple types of genomic data and allows examining and relaxing the commonly adopted conditional independence assumption. We investigate across-data-type FDR control in IMIX, and show the gain in lower misclassification rates at controlled over-all FDR compared with established individual data type analysis strategies, such as Benjamini-Hochberg FDR control, the q-value, and the local FDR control by extensive simulations. IMIX features statistically-principled model selection, FDR control, and computational efficiency. Applications to the Cancer Genome Atlas (TCGA) data provide novel multi-omic insights into the luminal/basal subtyping of bladder cancer and the prognosis of pancreatic cancer. Availability and implementationWe have implemented our method in R package "IMIX" with instructions and examples available at https://github.com/ziqiaow/IMIX.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    0
    Citations
    NaN
    KQI
    []