Classification of categorical sequences

2009 
The classification of categorical sequences is a fundamental process in many application fields. A key issue is to extract and make use of significant features hidden behind the chronological and structural dependencies found in these sequences. Almost all existing algorithms designed to perform this task are based on the matching of patterns in chronological order, but sequences often have similar structural features in non-chronological order. In addition, these algorithms have serious difficulties to outperform domain-specific algorithms. In this paper we propose CLASS, a general approach for the classification of categorical sequences. CLASS captures the significant patterns and reduces the influence of those representing merely noise. Moreover, CLASS employs a classifier called SNN for Significant-Nearest-Neighbours, inspired from the K-Nearest-Neighbours with a dynamic estimation of K. The extensive tests performed on a range of datasets from different fields show that CLASS is oftentimes competitive with domain-specific approaches.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    0
    Citations
    NaN
    KQI
    []