Are Your Data Gathered?

Alban Siffer Univ. Rennes, Inria, CNRS, IRISA, Amossys
Pierre-Alain Fouque Univ. Rennes, CNRS, IRISA, IUF
Alexandre Termier Univ. Rennes, Inria, CNRS, IRISA
Christine Largou


This paper studies the problem of Understanding data distributions . The authors propose the folding test of unimodality.


Understanding data distributions is one of the most fundamental research topic in data analysis. The literature provides a great deal of powerful statistical learning algorithms to gain knowledge on the underlying distribution given multivariate observations. We are likely to find out a dependence between features, the appearance of clusters or the presence of outliers. Before such deep investigations, we propose the folding test of unimodality. As a simple statistical description, it allows to detect whether data are gathered or not (unimodal or multimodal). To the best of our knowledge, this is the first multivariate and purely statistical unimodality test. It makes no distribution assumption and relies only on a straightforward p-value. Through real world data experiments, we show its relevance and how it could be useful for clustering.

You may want to know: