Collaborative Deep Metric Learning For Video Understanding

Authors:
Joonseok Lee Google
Sami Abu-El-Haija Google
Balakrishnan Varadarajan Google
Paul Natsev Google

Introduction:

The goal of video understanding is to develop algorithms that enable machines understand videos at the level of human experts. The authors propose a deep network that embeds videos using their audio-visual content, onto a metric space which preserves video-to-video relationships.

Abstract:

The goal of video understanding is to develop algorithms that enable machines understand videos at the level of human experts. Researchers have tackled various domains including video classification, search, personalized recommendation, and more. However, there is a research gap in combining these domains in one unified learning framework. Towards that, we propose a deep network that embeds videos using their audio-visual content, onto a metric space which preserves video-to-video relationships. Then, we use the trained embedding network to tackle various domains including video classification and recommendation, showing significant improvements over state-of-the-art baselines. The proposed approach is highly scalable to deploy on large-scale video sharing platforms like YouTube.

You may want to know: