Re-balancing Variational Autoencoder Loss for Molecule Sequence Generation

2020 
Molecule generation is to design new molecules with specific chemical properties and further to optimize the desired chemical properties. Following previous work, we encode molecules into continuous vectors in the latent space and then decode the embedding vectors into molecules under the variational autoencoder (VAE) framework. We investigate the posterior collapse problem of the current widely-used RNN-based VAEs for the molecule sequence generation. For the first time, we point out that the underestimated reconstruction loss of VAEs leads to the posterior collapse, and we also provide both analytical and experimental evidences to support our findings. To fix the problem and avoid the posterior collapse, we propose an effective and efficient solution in this work. Without bells and whistles, our method achieves the state-of-the-art reconstruction accuracy and competitive validity score on the ZINC 250K dataset. When generating 10,000 unique valid molecule sequences from the random prior sampling, it costs the JT-VAE 1450 seconds while our method only needs 9 seconds on a regular desktop machine.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    8
    Citations
    NaN
    KQI
    []