Joint Expression Synthesis and Representation Learning for Facial Expression Recognition

2021 
Facial expression recognition (FER) is a challenging task due to the large appearance variations and the lack of sufficient training data. Conventional deep approaches either learn a good representation through deep models or synthesize images automatically to enlarge the training set. In this paper, we perform both tasks jointly and propose an end-to-end deep model for simultaneous facial expression recognition and facial image synthesis. The proposed model is based on Generative Adversarial Network (GAN) and enjoys several merits. First, the facial image synthesis and facial expression recognition tasks can boost their performance for each other via the unified model. Second, paired images are not required in our facial image synthesis network, which makes the proposed model much more general and flexible. Meanwhile, the generated facial images largely expand the training set and ease the overfitting problem in our FER task. Third, different expressions are encoded in a disentangled manner in a latent space, which enables us to synthesize facial images with arbitrary expressions by exchanging certain parts of their latent identity features. Quantitative and qualitative evaluations on both controlled and in-the-wild FER benchmarks (Multi-PIE, MMI, and RAF-DB) demonstrate the effectiveness of our proposed method on both facial image synthesis and facial expression recognition task.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    4
    Citations
    NaN
    KQI
    []