Disentangling the Spatial Structure and Style in Conditional VAE

2020 
This paper proposes a structure in conditional variation autoencoder (cVAE) to disentangle the latent vector into a spatial structure and a style code, complementary to each other, with the one $( z_{s})$ being label relevant and the other $( z_{u})$ irrelevant. Different from traditional cVAE, our network maps the condition label into its relevant code z s through a separated module. Depending on whether the label directly relates to the image spatial structure or not, z s output from the condition mapping module is used either as the style code with the two spatial dimension of $1 \times 1$, or as the spatial structure code with a single channel. Based on the input image and its corresponding z s , the encoder provides the posterior distribution close to a common prior regardless of its label, thus z u sampled from it becomes label irrelevant. The decoder employs z s and z u by two typical adaptive normalization modules to reconstruct the input image. Results on two datasets with different types of labels show the effectiveness of our method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    2
    Citations
    NaN
    KQI
    []