Analysis and Implementation of MapReduce Parallelization of Nave Bayes Algorithm

2013 
Nave Bayes is an efficient algorithm.Due to the limitation of memory and I/O resources,the efficiency of the algorithm has been greatly affected in mass data processing.In this paper,proposed a novel Nave Bayes algorithm based on MapReduce programming model.Training set is cut apart before being processed.The core processing procedure is accomplished by MapReduce model.Extraction and parsing of the training set are processed in the Map function.Knowledge base of class and feature attributes are built in the Reduce function.In the experiments,mainly compare the performance of both the traditional algorithm and the improved parallel algorithm.The result of experiments shows that the parallel Nave Bayes algorithm has good efficiency and high scalability in mass data processing.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []