Disentangling Representations of Text by Masking Transformers

Xiongyi Zhang,Jan-Willem van de Meent,Byron C. Wallace

Disentangling Representations of Text by Masking Transformers

2021

Representations in large language models such as BERT encode a range of features into a single vector, which are predictive in the context of a multitude of downstream tasks. In this paper, we explore whether it is possible to learn disentangled representations by identifying subnetworks in pre-trained models that encode distinct, complementary aspects of the representation. Concretely, we learn binary masks over transformer weights or hidden units to uncover the subset of features that correlate with a specific factor of variation. This sidesteps the need to train a disentangled model from scratch within a particular domain. We evaluate the ability of this method to disentangle representations of syntax and semantics, and sentiment from genre in the context of movie reviews. By combining this method with magnitude pruning we find that we can identify quite sparse subnetworks. Moreover, we find that this disentanglement-via-masking approach performs as well as or better than previously proposed methods based on variational autoencoders and adversarial training.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations