Social Data Assisted Multi-Modal Video Analysis For Saliency Detection

2020 
Video saliency should be taken into consideration to facilitate optimization of the end-to-end video production, delivery and consumption ecosystem to improve user experience at lowered cost. Although recent studies have significantly increased the accuracy of saliency prediction, the approaches are mostly video-centric, without considering any prior "bias" that viewers may have with regard to the video contents. In this paper, we propose a novel learning-based multi-modal method for optimizing user-oriented video analysis. In particular, we generate a face-popularity mask using face recognition results and popularity information obtained from social media, and combine it with conventional content-only saliency analysis to produce multi-modal popularity-motion features. A convolutional long short-term memory (ConvL- STM) network discovers temporal correlation of human attention across frames. Experiments show that our method outperforms the state-of-the-art video saliency prediction approaches in representing human viewing preferences in real world applications, and demonstrate the necessity as well as the potential for integrating user bias information into attention detection.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    0
    Citations
    NaN
    KQI
    []