Learning User Interest with Improved Triplet Deep Ranking and Web-Image Priors for Topic-Related Video Summarization

2021 
Abstract Video summarization facilitates rapid browsing and efficient video indexing in many video browsing website applications, such as sport video highlights, dynamic video cover. In these applications, it is most important to generate user video summaries that capture interesting video content that users prefer. While many existing methods generate video summaries based on low-level features, this paper first proposes to mine large-scale Flickr images and find “interest” and “non-interest” images from Flickr for the same query to learn what is of interest to users. Unlike existing pairwise ranking-based methods for video summarization, we then propose an improved triplet deep ranking model that is easier to converge to learn the relationship between “interest” and “non-interest” Flickr images, and exploit what visual content of the original video is indeed preferred by users. In the training process, triplets (interest image p + , interest image p ′ + , non-interest image p - ) are selected as input to train a model with three parallel deep convolutional networks. In the video summarization process, an efficient entropy-based video segmentation method is proposed for dividing the original video into segments and the visual interest scores of the segments are estimated using the trained ranking network for summarization (SumNet). Then, an optimal subset of the segments is selected to create a summary capturing interesting visual content. We evaluate and compare our method with several state-of-the-art methods, experimental results show that our method achieves an improvement over the best baseline method by 9.6 % in terms of mean Average Precision (mAP) accuracy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    36
    References
    0
    Citations
    NaN
    KQI
    []