A Bayesian Method for Robust Estimation of Distributional Similarities

Jun’ichi Kazama,Stijn De Saeger,Kow Kuroda,Masaki Murata,Kentaro Torisawa

A Bayesian Method for Robust Estimation of Distributional Similarities

2010

Jun’ichi Kazama
Stijn De Saeger
Kow Kuroda
Masaki Murata
Kentaro Torisawa

Existing word similarity measures are not robust to data sparseness since they rely only on the point estimation of words' context profiles obtained from a limited amount of data. This paper proposes a Bayesian method for robust distributional word similarities. The method uses a distribution of context profiles obtained by Bayesian estimation and takes the expectation of a base similarity measure under that distribution. When the context profiles are multinomial distributions, the priors are Dirichlet, and the base measure is the Bhattacharyya coefficient, we can derive an analytical form that allows efficient calculation. For the task of word similarity estimation using a large amount of Web data in Japanese, we show that the proposed measure gives better accuracies than other well-known similarity measures.

Keywords:

Bayes estimator
Point estimation
Computer science
Dirichlet distribution
Prior probability
Statistics
Similarity measure
Multinomial distribution
Robust statistics
Pattern recognition
Artificial intelligence
Bhattacharyya distance
Bayesian probability

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations