language-icon Old Web
English
Sign In

On Identifiability in Transformers

2020 
In this work we contribute towards a deeper understanding of the Transformer architecture by investigating two of its core components: self-attention and contextual embeddings. In particular, we study the identifiability of attention weights and token embeddings, and the aggregation of context into hidden tokens. We show that attention weights are not unique and propose effective attention as an alternative for better interpretability. Furthermore, we show that input tokens retain their identity in the first hidden layers and then progressively become less identifiable. We also provide evidence for the role of non-linear activations in preserving token identity. Finally, we demonstrate strong mixing of input information in the generation of contextual embeddings by means of a novel quantification method based on gradient attribution. Overall, we show that self-attention distributions are not directly interpretable and present tools to further investigate Transformer models.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    40
    References
    101
    Citations
    NaN
    KQI
    []