Novel metrics for computing semantic similarity with sense embeddings

2020 
Abstract In the last years many efforts have been spent to build word embeddings, a representational device in which word meanings are described through dense unit vectors of real numbers over a continuous, high-dimensional Euclidean space, where similarity can be interpreted as a metric. Afterwards, sense-level embeddings have been proposed to describe the meaning of senses, rather than terms. More recently, additional intermediate representations have been designed, providing a vector description for pairs 〈 t e r m , s e n s e 〉 , and mapping both term and sense descriptions onto a shared semantic space. However, surprisingly enough, this wealth of approaches and resources has not been supported by a parallel refinement in the metrics used to compute semantic similarity: to date, the semantic similarity featuring two input entities is mostly computed as the maximization of some angular distance intervening between vector pairs, typically cosine similarity. In this work we introduce two novel similarity metrics to compare sense-level representations, and show that by exploiting the features of sense-embeddings it is possible to substantially improve on existing strategies, by obtaining enhanced correlation with human similarity ratings. Additionally, we argue that semantic similarity needs to be complemented by another task, involving the identification of the senses at the base of the similarity rating. We experimentally verified that the proposed metrics are beneficial when dealing with both semantic similarity task and sense identification task. The experimentation also provides a detailed how-to illustrating how six important sets of sense embeddings can be used to implement the proposed similarity metrics.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    92
    References
    5
    Citations
    NaN
    KQI
    []