Word2Box: Learning Word Representation Using Box Embeddings.

Shib Sankar Dasgupta,Michael Boratko,Shriya Atmakuri,Xiang Lorraine Li,Dhruvesh Patel,Andrew McCallum

Word2Box: Learning Word Representation Using Box Embeddings.

2021

Learning vector representations for words is one of the most fundamental topics in NLP, capable of capturing syntactic and semantic relationships useful in a variety of downstream NLP tasks. Vector representations can be limiting, however, in that typical scoring such as dot product similarity intertwines position and magnitude of the vector in space. Exciting innovations in the space of representation learning have proposed alternative fundamental representations, such as distributions, hyperbolic vectors, or regions. Our model, Word2Box, takes a region-based approach to the problem of word representation, representing words as $n$-dimensional rectangles. These representations encode position and breadth independently and provide additional geometric operations such as intersection and containment which allow them to model co-occurrence patterns vectors struggle with. We demonstrate improved performance on various word similarity tasks, particularly on less common words, and perform a qualitative analysis exploring the additional unique expressivity provided by Word2Box.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations