Improvement and Research of FP-Growth Algorithm Based on Distributed Spark

2015 
FP-growth algorithm as the representatives of non-pruning algorithms is widely used in mining transaction datasets. But it is sensitive to the calculation and the scale of datasets. When building FP-tree, the search operation as the major time-consuming operation has a higher complexity. And when the horizontal or vertical dimension of data set is larger, the mining efficiency will be reduced or even failed. To solve the above problems, reducing the complexity of search time and applying distributed computing are the widely used strategies. This paper presents a distributed SPFP algorithm based on Spark framework and improved FP-growth algorithm. The results of tests show that, compared to the PFP algorithm based on MapReduce, the OPFP algorithm based on Spark and original FP-growth algorithm, SPFP has high efficiency, cluster and flexibility.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    8
    Citations
    NaN
    KQI
    []