Gaussian Mixture Variational Autoencoder for Semi-Supervised Topic Modeling

2020 
Topic models are widely explored for summarizing a corpus of documents. Recent advances in Variational AutoEncoder (VAE) have enabled the development of black-box inference methods for topic modeling in order to alleviate the drawbacks of classical statistical inference. Most existing VAE based approaches assume a unimodal Gaussian distribution for the approximate posterior of latent variables, which limits the flexibility in encoding the latent space. In addition, the unsupervised architecture hinders the incorporation of extra label information, which is ubiquitous in many applications. In this paper, we propose a semi-supervised topic model under the VAE framework. We assume that a document is modeled as a mixture of classes, and a class is modeled as a mixture of latent topics. A multimodal Gaussian mixture model is adopted for latent space. The parameters of the components and the mixing weights are encoded separately. These weights, together with partially labeled data, also contribute to the training of a classifier. The objective is derived under the Gaussian mixture assumption and the semi-supervised VAE framework. Modules of the proposed framework are appropriately designated. Experiments performed on three benchmark datasets demonstrate the effectiveness of our method, comparing to several competitive baselines.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    40
    References
    2
    Citations
    NaN
    KQI
    []