Parallel algorithms for clustering biological graphs on distributed and shared memory architectures

2014 
Graph algorithms on parallel architectures present an interesting case study for irregular applications. In this paper, we address one such irregular application - one of clustering real-world graphs constructed out of biological data using parallel computers. We present the design and evaluation of two different parallel implementations of a serial graph clustering heuristic called the Shingling heuristic, which was developed by Gibson et al. In the OpenMP shared memory implementation pClust-sm, we were able to improve both the asymptotic runtime and memory complexities of the serial implementation, and drastically reduce the time to solution from the order of several days to a few minutes on larger inputs ∼100 M edges. With the Hadoop MapReduce implementation pClust-mr, we were able to demonstrate linear scaling up to 64 cores on modest sized inputs ∼11 M edges and enhance the problem size reach by about two orders of magnitude relative to a serial implementation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    42
    References
    15
    Citations
    NaN
    KQI
    []