TVENet: Temporal variance embedding network for fine-grained action representation

2020 
Abstract With the breakthroughs in general action understanding, it has become an inevitable trend to analyze the actions in finer granularity. However, related researches have been largely hindered by the lack of fine-grained datasets and the difficulty of capturing subtle differences between fine-grained actions that are highly similar overall. In this paper, we address the above challenges by constructing a fine-grained action dataset, i.e., Figure Skating, which can be used for end-to-end network training and presenting a framework for the joint optimization of classification and similarity constraints. We propose to incorporate the triplet loss into the training of Convolutional Neural Network, which learns a mapping from fine-grained actions to a compact Euclidean space where distances directly correspond to a measure of action similarity. Triplet loss compels actions of distinct classes to have larger distances than actions of the same class. Besides, to boost the discrimination of the fine-grained actions, we further propose a temporal variance embedding network (TVENet) embedding temporal context variances into the feature embeddings during the joint network training. The experimental results on Figure Skating dataset, HMDB51 dataset as well as UCF101 dataset demonstrate the effectiveness of TVENet representation for fine-grained action search.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    71
    References
    4
    Citations
    NaN
    KQI
    []