Parallel algorithms for clustering biological graphs on distributed and shared memory architectures

Inna Rytsareva,Timothy Chapman,Anantharaman Kalyanaraman

Parallel algorithms for clustering biological graphs on distributed and shared memory architectures

2014

Graph algorithms on parallel architectures present an interesting case study for irregular applications. In this paper, we address one such irregular application - one of clustering real-world graphs constructed out of biological data using parallel computers. We present the design and evaluation of two different parallel implementations of a serial graph clustering heuristic called the Shingling heuristic, which was developed by Gibson et al. In the OpenMP shared memory implementation pClust-sm, we were able to improve both the asymptotic runtime and memory complexities of the serial implementation, and drastically reduce the time to solution from the order of several days to a few minutes on larger inputs ∼100 M edges. With the Hadoop MapReduce implementation pClust-mr, we were able to demonstrate linear scaling up to 64 cores on modest sized inputs ∼11 M edges and enhance the problem size reach by about two orders of magnitude relative to a serial implementation.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations