Identifying online methods for monitoring foodborne illness: a scoping review of existing public health surveillance techniques

2018 
Background: Traditional methods of foodborne illness surveillance are associated with problems of untimeliness and underreporting. In recent years many studies have used online reviews and social media data to quantify and monitor the incidence of a disease or public health ailment. These Consumer Generated (CGD) sources prove timelier than traditional GP surveillance data, can help to fill gaps in the reporting process, and include additional metadata proving advantageous for supplementary research. Objective: This review aims to identify and formally analyse research articles employing methods for the quantification and surveillance of disease or public health ailment using social media data and online reviews. Studies are scarce within the food safety domain, subsequently the identification and understanding of methods in other health related fields which are transferable to foodborne illness surveillance are of particular interest. Methods: Structured scoping methods were used to identify and analyse primary research articles published between 2002 and 2017 and utilising CGD for disease or public health surveillance. The title, abstract and keyword fields of five databases were searched using pre-determined search terms. 5239 papers matched the search criteria and were subject to title screening for relevance. Following full text review, data characterisation and thematic analysis was undertaken for 62 studies which were deemed relevant. Information relating to; topic, geographic region, primary data type, corpus size, control data type (if used), keyword selection, methods, results, demographic analysis and limitations, was extracted and summarised. Results: 62 articles proposed methods of calculating disease or ailment incidence in the population using social media or online reviews. The majority of studies, 40/62, focussed on the surveillance of Influenza Like Illness (ILI) and 10/62 studies focussed on the use of novel data for foodborne illness monitoring. Twitter data, 58/62, and Yelp reviews, 3/62, were the most common data sources. Studies reporting high correlations against baseline statistics used advanced statistical and computational approaches to calculate the incidence of disease. These included classification and regression approaches, clustering approaches, and lexicon-based approaches. Although they are computationally intensive due to the requirement of training data, studies employing classification approaches reported the best performance against published statistics. Conclusions: By analysing the wider field of computer science literature in the context of digital epidemiology, this paper has identified and analysed methods which are transferable to foodborne disease surveillance. These methods fall into four main categories; B) basic approach, R) classification and regression, C) Clustering approaches and L) Lexicon-based approaches. Although simple studies utilising only keyword occurrences to calculate disease incidence generally report good performance against baseline measures, they are sensitive to chatter generated by media reports. More computationally advanced approaches such as machine learning methods and lexicon-based approaches are required to filter spurious messages and protect predictive systems against false alarms. Reducing the occurrence of false positives and minimising model sensitivity is reported by the majority of studies as one of the main challenges in building a model to quantify disease incidence. Research using CGD for monitoring ILI is expansive, however research regarding the use of online reviews and social media data in the context of food safety is limited. Not only is CGD timelier than traditional data, it may also circumvent problems associated with underreporting. Considering the advantages reported in this review, methods employing CGD for foodborne disease surveillance warrant further investment.
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []