A geometric-based data reduction approach for large low dimensional datasets: Delaunay triangulation in SVM algorithms

2021 
Abstract Training a support vector machine (SVM) on large datasets is a slow daunting process. Further, SVM becomes slow in the testing phase, due to its large number of support vectors (SVs). This paper proposes an effective geometric algorithm based on construction of Delaunay triangulation (DT) algorithm using Quickhull algorithm with a novel strategy to exactly identify and extract the boundary data points laid between the two classes of a dataset, and later uses these most informative data points as a reduced dataset to solve various SVM algorithms and proposes new DT-SVM algorithms Two synthetic datasets with the size of 1K incrementally up to 500K datasets are generated to extensively verify the effectiveness of the proposed DT-SVM algorithms over various data sizes and for further assessment, the most efficient version of proposed DT-SVM is applied on well-known benchmark datasets from UCI Machine Learning Repository. Two variant of sequential minimization optimization (SMO) decomposition methods, in addition to Least Square form of SVM are implemented to present the scalability of new DT-SVM algorithms in linear/nonlinear separable/non-separable large low dimensional datasets. Moreover, the most efficient version of the proposed algorithm is compared to RCH-SK as a known geometric approach in the SVM literature. The results demonstrate that while the proposed approach improves the scalability of DT-SVM in large low dimensional datasets, it leads SVM algorithms to maintain the accuracy in an acceptable range with considerably lower time in both training and testing phases with using a noticeably fewer number of SVs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    44
    References
    0
    Citations
    NaN
    KQI
    []