INCA: New statistic for estimating the number of clusters and identifying atypical units

2008 
This paper presents a solution to two problems that arise in the classification of data such as types of tumor, samples of gene expression profiles or general biomedical data. First, to estimate the real number of clusters in a data set and second to decide whether a new unit belongs to one of these previously identified clusters or it is an outlier or atypical unit. We propose a new statistic which allows us to solve these problems. As our approach is based on a measure of distance or dissimilarity between any pair of units, it can be applied to any kind of multivariate data (continuous, binary or multi-attribute data) and it has applications in many biomedical fields. We validated the approach in simulated examples and applied it to the diagnosis of dermal diseases and to the analysis of lymphatic cancer data, showing the good performance of our approach. Copyright © 2007 John Wiley & Sons, Ltd.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    30
    References
    19
    Citations
    NaN
    KQI
    []