Building risk prediction models for daily use of marijuana using machine learning techniques.

2021 
Identifying the characteristics of adults with recent marijuana use is limited by standard statistical methods and requires a unique approach. The objective of this study is to evaluate the efficiency of machine learning models in predicting daily marijuana use and identify factors associated with daily use among adults. The study analyzed pooled data from the 2016-2019 Behavioral Risk Factor Surveillance System (BRFSS) Survey in 2020. Prediction models were developed using four machine learning algorithms, including Logistic Regression, Decision Tree, and Random Forest with Gini function, and Naive Bayes. Respondents were randomly divided into training and testing samples. The performance of all the models was compared using accuracy, AUC, precision, and recall. The study included 253,569 respondents, of whom 10,182 (5.9 %) reported daily marijuana use in the last 30 days. Of daily marijuana use, 53.4 % were young adults (age 18-34 years), 34.3 % female, 56.1 % non-Hispanic White, 15.2 % were college graduates, and 67.3 % were employed. Random Forest was the best performing model with AUC 0.97, followed by a Decision tree (AUC 0.95). The most important factors for daily marijuana use were the current use of e-cigarette and combustible cigarette use, male gender, unmarried, poor mental health, depression, cognitive decline, abnormal sleep pattern, and high-risk behavior. Data mining methods were useful in the discovery of behavior health-risk knowledge and to visualize the significance of predicting modeling from a multidimensional behavioral health survey.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    1
    Citations
    NaN
    KQI
    []