The Power of Second-Order Decision Tables*

2002 
Abstract The success of data mining techniques can be measured by the usefulness of models they produce. Often these models must be explainable as well as accurate. While decision tables are easy to interpret and explain to virtually all users, there has been little study of whether such simple models are powerful enough to use for data mining. This paper presents SORCER, a learning system that induces second-order decision tables from a given data set. Second-order tables are database relations in which rows have sets of atomic values as components. Using sets of values interpreted as disjunctions, second-order tables provide simple and compact representations that both enhance comprehensibility and facilitate efficient management. Unlike many other learning systems, to further promote comprehensibility, SORCER attempts to generate a minimum number of rows. SORCER's induction algorithm can be viewed as a table compression technique in which a traditional table, representing training data, is transformed into a second-order table with fewer rows by merging rows in ways that preserve consistency with the training data. We compare SORCER to three classification systems: C4.5, CBA and a Naive Bayesian classifier. Experimental results on 26 data sets show that the average error rate obtained from SORCER, using a simple compression method, is lower than those of the Naive Bayesian classifier and C4.5 and is competitive with CBA's average error rate. Using a slightly more sophisticated compression, SORCER's average error rate is the lowest of all.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []