Accelerating Large-Scale Data Analysis By Offloading To High-Performance Computing Libraries Using A

Authors:
Alex Gittens Rensselaer Polytechnic Institute
Kai Rothauge University of California, Berkeley
Shusen Wang University of California, Berkeley
Michael Mahoney University of California, Berkeley
Lisa Gerhardt NERSC/LBNL
Prabhat NERSC/LBNL
Jey Kottalam University of California, Berkeley
Michael Ringenburg Cray Inc.
Kristyn Maschhoff Cray Inc.

Introduction:

many linear algebra computations that are the basis for solving common machine learning problems are significantly slower in Spark than when done using libraries written in a high-performance computing framework such as the Message-Passing Interface (MPI).

Abstract:

Apache Spark is a popular system aimed at the analysis of large data sets, but recent studies have shown that certain computations—-in particular, many linear algebra computations that are the basis for solving common machine learning problems—-are significantly slower in Spark than when done using libraries written in a high-performance computing framework such as the Message-Passing Interface (MPI).

You may want to know: