MFF: Multi-modal feature fusion for zero-shot learning

2022 
Generative Zero-Shot Learning (ZSL) methods generally generate pseudo-samples/features based on the semantic description information of unseen classes, thereby transforming ZSL tasks into traditional supervised learning tasks. Under this learning paradigm, the quality of pseudo-samples/features guided by the classes’ semantic description information is the key to the success of the model. However, the semantic description information used in the existing generative methods is mainly the low-dimensional representation (e.g., attributes) of classes, which leads to the low quality of the generated pseudo-samples/features and may aggravate the problem of domain shift. To alleviate this problem, we introduce the visual principal component feature, which is extracted by a principal component analysis network, to make up for the deficiency of using only semantic description information and propose a novel Variational Auto-Encoder (VAE) and Generative Adversarial Network (GAN) based generative method for ZSL, which we call Multi-modal Feature Fusion algorithm (MFF). In MFF, the input of different modal information enables VAE better fit the original data distribution and the proposed alignment loss ensures the consistency of the generated visual features and the corresponding semantic features. With the help of high-quality pseudo-samples/features, the ZSL model can make more accurate predictions for unseen classes. Extensive experiments on five public datasets demonstrate that our proposed algorithm outperforms several state-of-the-art methods under both ZSL and generalized ZSL settings.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []