On Identifiability in Transformers

Gino Brunner,Yang Liu,Damian Pascual,Oliver Richter,Massimiliano Ciaramita,Roger Wattenhofer

On Identifiability in Transformers

2020

In this work we contribute towards a deeper understanding of the Transformer architecture by investigating two of its core components: self-attention and contextual embeddings. In particular, we study the identifiability of attention weights and token embeddings, and the aggregation of context into hidden tokens. We show that attention weights are not unique and propose effective attention as an alternative for better interpretability. Furthermore, we show that input tokens retain their identity in the first hidden layers and then progressively become less identifiable. We also provide evidence for the role of non-linear activations in preserving token identity. Finally, we demonstrate strong mixing of input information in the generation of contextual embeddings by means of a novel quantification method based on gradient attribution. Overall, we show that self-attention distributions are not directly interpretable and present tools to further investigate Transformer models.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

101

Citations