Predictive performance of international COVID-19 mortality forecasting models

2020 
Forecasting models have provided timely and critical information about the course of the COVID-19 pandemic, predicting both the timing of peak mortality, and the total magnitude of mortality, which can guide health system response and resource allocation. Out-of-sample predictive validation--checking how well past versions of forecasting models predict subsequently observed trends--provides insight into future model performance. As data and models are updated regularly, a publicly available, transparent, and reproducible framework is needed to evaluate them in an ongoing manner. We reviewed 384 published and unpublished COVID-19 forecasting models, and evaluated seven models for which publicly available, multi-country, and date-versioned mortality estimates could be downloaded. These included those modeled by: DELPHI-MIT (Delphi), Youyang Gu (YYG), the Los Alamos National Laboratory (LANL), Imperial College London (Imperial), and three models produced by the Institute for Health Metrics and Evaluation (IHME), a curve fit model (IHME-CF), a hybrid curve fit and epidemiological compartment model (IHME-CF SEIR), and a hybrid mortality spline and epidemiological compartment model (IHME-MS SEIR). Collectively models covered 169 countries, as well as the 50 states of the United States, and Washington, D.C., and accounted for >99% of all reported COVID-19 deaths on July 20th, 2020. As expected, errors in mortality predictions increased with a larger number of weeks of extrapolation. For the most recent models, released in June, at six weeks of forecasting the best performing model was the IHME-MS SEIR model, with a cumulative MAPE of 10.2%, followed by YYG (11.3%) and LANL (12.6%). Looking across models, errors in cumulative mortality predictions were highest in sub-Saharan Africa and lowest in high-income countries, reflecting differences in data availability and prediction difficulty in earlier vs. later stages of the epidemic. For peak timing prediction, among models released in April, median absolute error values at six weeks ranged from 20 days for the Imperial model to 35 days for the YYG model. In sum, we provide a publicly available dataset and evaluation framework for assessing the predictive validity of COVID-19 mortality forecasts. We find substantial variation in predictive performance between models, and note large differences in average predictive validity between regions, highlighting priority areas for further study in sub-Saharan Africa and other emerging-epidemic contexts.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    29
    Citations
    NaN
    KQI
    []