Abstract 2001: Non-negative matrix factorization (NMF) as a clinical classifier: An example with chemotherapy response in ovarian cancer

2010 
Proceedings: AACR 101st Annual Meeting 2010‐‐ Apr 17‐21, 2010; Washington, DC Large-scale whole-genome assays such as aCGH and expression profiling are providing an overabundance of data on disease states, yet much of the information is not relevant to driving the disease. There is a need for generating concise explanatory modules with powerful predictive results. Here we describe a semi-supervised machine-learning method for classifying -omics data with associated clinical features. Using NMF a matrix of -omics data is convoluted step-wise into two matrices; a table of k “metagenes” containing coefficients for membership of each gene, and a table of prediction strengths for each patient into each k-class. Where the class of a patient is known we initially set the prediction strength for that class high, otherwise the prediction strength for each class is set equally. As the NMF algorithm progresses, the class prediction can be swapped if this is supported by the -omics data. Initially the metagene coefficients are unknown and seeded randomly. As NMF does not converge onto a unique solution, multiple metagenes were learned starting at different random coefficients. These multiple models were cross-validated as predictors against held-out sets of patients using a correlation statistic different from the one used to train the model. Where a held-out patient correlates with none of the model's k-classes they are considered members of a novel class, and can be abstained from classification a priori. We applied our method on the TCGA ovarian serous carcinoma. In this study aCGH and expression profiles were was taken before platinum treatment and the time to relapse recorded. These -omics measurements were then integrated into inferred pathway activities using a computational model of the central dogma (DIGMA). We predicted platinum sensitivity (no relapse or relapse after 12 mo) vs. platinum resistance (relapse within 3mo) on these inferred pathway activities. Consistently, the model with the highest predictive accuracy in a cross-validation setting correctly placed patients into the correct class >80% of the time. Using abstaining it is possible to achieve >90% correct prediction upon a subset of the patients. These results provide useful prognostic indicators as well as indicators for which events drive disease. Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 101st Annual Meeting of the American Association for Cancer Research; 2010 Apr 17-21; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2010;70(8 Suppl):Abstract nr 2001.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []