Application of machine learning on colonoscopy screening records for predicting colorectal polyp recurrence

2018 
Colorectal cancer is the second leading cause of cancer-related deaths in the United States. Colorectal cancer risk can be effectively managed through early detection and removal of precancerous lesions, known as colorectal polyps, with routine colonoscopy screening. The current guidelines for colonoscopy screening and surveillance do not consider detailed clinical information and polyp characteristics from prior colonoscopies. Developing a colonoscopy surveillance plan based upon a patient’s personalized polyp recurrence risk is important for preventing the progression of colorectal cancer. To address this clinical need, in this paper, we proposed and developed a natural language processing and machine learning model to predict colorectal polyp recurrence risk using features derived from patient colonoscopy and pathology reports in electronic medical record systems. Colonoscopy records and the associated pathology reports from 952 patients in a tertiary academic care center in New Hampshire were obtained from 2011 to 2017. Polyp characteristics were extracted from these records using a natural language processing pipeline. The extracted features from these records along with other demographic and anthropometric information were used to develop and compare six machine learning models for their ability to predict polyp recurrence. Our evaluation of these models revealed a range of performance advantages, such as an area under the curve as high as 65%, and it further highlighted important features in predicting polyp recurrence from demographic and medical health record sources. Our predictive analysis highlights the potential of personalized risk modeling for colorectal cancer screening, which can reduce unnecessary screenings, healthcare costs, and psychological stress, while improving patient health outcomes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    1
    Citations
    NaN
    KQI
    []