Heterogeneous spatio-temporal relation learning network for facial action unit detection

2022 
Properties such as temporal relation and action relation of facial Action Units (AUs) make AU detection different from general multi-label classification tasks. Therefore, how to capture the spatial and temporal co-occurrence of AUs becomes the key to improving detection accuracy. Although many works on the spatial relation of AUs have been proposed in recent years, very few works have explored the temporal relation of AUs. In this paper, we propose a Heterogeneous Spatio-Temporal Relation learning network (HSTR-Net) to capture the temporal and spatial relations of AUs. The co-occurrence knowledge graph module guides the network to model spatio-temporal relations by introducing prior relation information, and the spatio-temporal Transformer module adaptively captures spatio-temporal relations through spatio-temporal feature interaction. To further model the temporal relation between AUs, the self-attention mechanism to fuse AU features in the time dimension. Extensive experiments are conducted on two challenging datasets, BP4D and DISFA, and experimental results show that our proposed HSTR-Net achieves the comparable performance of the state-of-the-art in the field of AU detection. The code for our methods is available at .
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []