Robust Character Labeling in Movie Videos: Data Resources and Self-supervised Feature Adaptation

2021 
Robust face clustering is a vital step in enabling a computational understanding of visual character portrayal in media. Face clustering for long-form content such as movies is challenging because of variations in appearance and lack of large-scale labeled data resources. Our work focuses on two key aspects of this problem: the lack of domain-specific training or benchmark datasets and adapting face embeddings learned on web images to the domain of movie videos. First, we curated over 169,000 face tracks from 240 Hollywood movies with weak labels on whether a pair of face tracks belong to the same or different characters. We proposed an offline nearest-neighbor search in the embedding space to mine hard-examples from these tracks. We then explored triplet-loss and multiview correlation-based methods for adapting face image embeddings to hard-examples from movie videos. We also developed SAIL-Movie Character Benchmark corpus to augment existing benchmarks with more racially diverse characters and provided face-quality labels for subsequent error analysis. Our experimental results highlight the use of weakly labeled data for domain-specific feature adaptation. Overall, we found that multiview correlation-based adaptation yielded robust and more discriminative face embeddings. Its performance on downstream face verification and clustering tasks was comparable to that of the state-of-the-art results in this domain. We hope that the large-scale datasets developed in this work can further advance automatic character labeling in videos. All resources are available at https://sail.usc.edu/~ccmi/multiface.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    53
    References
    0
    Citations
    NaN
    KQI
    []