Efficient Parallel Algorithm for Mining High Utility Patterns Based on Spark

2019 
Utility pattern mining is an important data mining technology that can find patterns that are both statistically significant and in accordance with users' expectations and objectives, which emerged recently to address the limitation of frequent pattern mining. To the best of our knowledge, utility pattern mining algorithms proposed so far however are all sequential ones that are inefficient and not scalable when dealing with huge amounts of data. To address the scalability and efficiency challenge, this paper proposes a parallel utility pattern mining algorithm based on Spark, a parallel programming model that uses in-memory storage for data sharing. The contributions include an improved vertical data structure, a three-phase parallel mining framework, and an efficient algorithm. Extensive experiments on both artificial and real-world data show that the proposed parallel algorithm is scalable and efficient.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    0
    Citations
    NaN
    KQI
    []