Efficient Parallel Algorithm for Mining High Utility Patterns Based on Spark

Junqiang Liu,Rong Zhao,Xiangcai Yang,Yong Zhang,Xiaoning Jiang

Efficient Parallel Algorithm for Mining High Utility Patterns Based on Spark

2019

Utility pattern mining is an important data mining technology that can find patterns that are both statistically significant and in accordance with users' expectations and objectives, which emerged recently to address the limitation of frequent pattern mining. To the best of our knowledge, utility pattern mining algorithms proposed so far however are all sequential ones that are inefficient and not scalable when dealing with huge amounts of data. To address the scalability and efficiency challenge, this paper proposes a parallel utility pattern mining algorithm based on Spark, a parallel programming model that uses in-memory storage for data sharing. The contributions include an improved vertical data structure, a three-phase parallel mining framework, and an efficient algorithm. Extensive experiments on both artificial and real-world data show that the proposed parallel algorithm is scalable and efficient.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations