Gaussian mixture models for probabilistic classification of breast cancer

2019 
In the era of omics-driven research, it remains a common dilemma to stratify individual patients based on the molecular characteristics of their tumors. To improve molecular stratification of patients with breast cancer, we developed the Gaussian mixture model (GMM)–based classifier. This probabilistic classifier was built on mRNA expression data from more than 300 clinical samples of breast cancer and healthy tissue and was validated on datasets of ESR1, PGR , and ERBB2 , which encode standard clinical markers and therapeutic targets. To demonstrate how a GMM approach could be exploited for multiclass classification using data from a candidate marker, we analyzed the insulin-like growth factor I receptor (IGF1R), a promising target, but a marker of uncertain importance in breast cancer. The GMM defined subclasses with downregulated (40%), unchanged (39%), upregulated (19%), and overexpressed (2%) IGF1R levels; inter- and intrapatient analyses of IGF1R transcript and protein levels supported these predictions. Overexpressed IGF1R was observed in a small percentage of tumors. Samples with unchanged and upregulated IGF1R were differentiated tumors, and downregulation of IGF1R correlated with poorly differentiated, high-risk hormone receptor–negative and HER2-positive tumors. A similar correlation was found in the independent cohort of carcinoma in situ , suggesting that loss or low expression of IGF1R is a marker of aggressiveness in subsets of preinvasive and invasive breast cancer. These results demonstrate the importance of probabilistic modeling that delves deeper into molecular data and aims to improve diagnostic classification, prognostic assessment, and treatment selection. Significance: A GMM classifier demonstrates potential use for clinical validation of markers and determination of target populations, particularly when availability of specimens for marker development is low.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    50
    References
    5
    Citations
    NaN
    KQI
    []