On the Use of Textual and Visual Data from Online Social Networks for Predicting Community Health

2020 
Social media has been increasingly utilized as a powerful sensor for a wide range of healthcare applications from infectious disease monitoring, psychiatric disorder detection to public health prediction. However, most existing social media-based approaches have been so far based on textual data mining, while imagery information, that has been shown as an effective indicator of expression, status and experiences, is still under-explored. In this paper, we investigate the use of two typical data sources from social media: text and images, in prediction of population health outcomes. Specifically, we propose two types of population health representations extracted from social media photos, in particular color histograms - a hand-crafted visual feature set, and automatic features learned from a deep convolutional neural network. For evaluation, we benchmark the proposed visual feature sets and existing well-known textual features including language style features and content-based features. To deal with the problem of weakly-labeled data, the multi-instance learning technique was applied to reduce probable classification bias. These features are evaluated in the task of the U.S. county health outcome prediction based on a large-scale dataset collected from Flickr, and the Behavioral Risk Factor Surveillance System. We have found that imagery information, that has been investigated in studies on emotion analysis, is also an informative indicator for population health outcomes. These experiments along with in-depth analysis will serve as a technical documentation for future research on population health analysis through the lens of social media.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    47
    References
    0
    Citations
    NaN
    KQI
    []