Kernel learning method for distance-based classification of categorical data

2014 
Kernel-based methods have become popular in machine learning; however, they are typically designed for numeric data. These methods are established in vector spaces, which are undefined for categorical data. In this paper, we propose a new kind of kernel trick, showing that mapping of categorical samples into kernel spaces can be alternatively described as assigning a kernel-based weight to each categorical attribute of the input space, so that common distance measures can be employed. A data-driven approach is then proposed to kernel bandwidth selection by optimizing feature weights. We also make use of the kernel-based distance measure to effectively extend nearest-neighbor classification to classify categorical data. Experimental results on real-world data sets show the outstanding performance of this approach compared to that obtained in the original input space.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    2
    Citations
    NaN
    KQI
    []