Applying random forest in a health administrative data context: a conceptual guide

2021 
To introduce Random Forest (RF), a machine learning method, in an accessible way for health services researchers and highlight its unique considerations when applied to health administrative data. Physician claims’ data from the universal public insurer linked with the Canadian Community Health Survey for the Canadian province of Quebec. We describe in detail how RF can be useful in health services research, provide guidance on data set up, modeling decisions and demonstrate how to interpret results. We also highlight specific considerations for applying RF to health administrative data. In a working example, we compare RF with logistic regression, Ridge regression and LASSO in their ability to predict whether a person has a regular medical doctor. We use survey responses to “do you have a regular medical doctor” from three cycles of the Canadian Community Health Survey (2007, 2009, 2011). Responses are linked with physician claims’ data from 2002 to 2012. We limit our cohort to persons 40 years and older at the time of responding to the survey. We discuss the strengths and weaknesses of using RF in a health services research setting in comparison to using more conventional modeling techniques. Applying a RF model in a health services research setting can have advantages over conventional modeling approaches and we encourage health services researchers to add RF to their toolbox of predictive modeling methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    65
    References
    0
    Citations
    NaN
    KQI
    []