Mining contrast sequential pattern based on subsequence time distribution variation with discreteness constraints

2019 
Contrast sequential pattern is defined as a pattern that occurs frequently in one sequence dataset but not in the others. Contrast sequential pattern mining has been widely used in many fields, such as customer behavior analysis and medical diagnosis. Existing algorithms first require users to set a distinguishing location and then use this fixed location to identify distribution differences of different subsequences, i.e., the subsequence pattern that appears before the given distinguishing location in one sequence dataset and after the same location in another sequence dataset. However, it is difficult for users to set an appropriate location without sufficient prior knowledge. Since the distinguishing location is different for different subsequences, setting a fixed location may ignore many meaningful patterns. In addition, previous studies rarely considered the time distribution variation of subsequences and the discreteness of patterns. To solve the above problems, we propose a novel method of mining contrast sequential pattern based on subsequence time distribution variation with discreteness constraints in this paper. A suffix-tree based search algorithm, which transforms the dataset to be processed into a tree representation, is designed to mine contrast sequential pattern based on subsequence time distribution variation. Experiments are conducted on real-world time-series datasets, and the experimental results validate the superiority of our method in terms of effectiveness and efficiency when compared with other state-of-the-art methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    6
    Citations
    NaN
    KQI
    []