Applying Bayesian hyperparameter optimization towards accurate and efficient topic modeling in clinical notes

2021 
The increased adoption of electronic health record systems has led to boons for downstream analyses within the clinical domain. The identification of relevant patient data for focal analyses remains a key challenge. In this study, we applied sequential model-based global optimization towards tuning the hyperparameters of latent Dirichlet allocation, a standard topic modeling technique. We showcase the identification of physician notes specific to chronic lymphocytic leukemia treatment as a generalizable use-case. Using each identified topic component as a pseudo binary classifier, our best predictive model achieved an area under the receiver operating characteristic curve of 0.9. Additionally, metrics associated with hyperparameter tuning are in line with higher level domain understanding. This study demonstrated the efficacy of the hyperparameter tuning process towards topic modeling and we make the generalized tool available at https://github.com/sema4hai/ehr-topic-model.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    0
    Citations
    NaN
    KQI
    []