A Parallel Algorithm for Tracking Dynamic Communities based on Apache Flink

2018 
Real world social networks are highly dynamic environments consisting of numerous users and communities, rendering the tracking of their evolution a challenging problem. In this work, we propose a parallel algorithm for tracking dynamic communities between consecutive timeframes of the social network, where communities are represented as undirected graphs. Our method compares the communities based on the widely adopted Jaccard similarity measure and is implemented on top of Apache Flink, a novel framework for parallel and distributed data processing. We evaluate the benefits, in terms of execution time, that parallel processing brings to community tracking on datasets carrying different quantitative characteristics, derived from two popular social media platforms; Twitter and Mathematics Stack Exchange Q&A. Experiments show that our parallel method has the ability to calculate the similarity of communities within seconds, even for large social networks, consisting of more than 600 communities per timeframe.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    0
    Citations
    NaN
    KQI
    []