Micro-Architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

Ahsan Javed Awan,Mats Brorsson,Vladimir Vlassov,Eduard Ayguadé

Micro-Architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

2016

While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics for being a unified framework for both, batch and stream data processing. However, recent studies on micro-architectural characterization of in-memory data analytics are limited to only batch processing workloads. We compare the micro-architectural performance of batch processing and stream processing workloads in Apache Spark using hardware performance counters on a dual socket server. In our evaluation experiments, we have found that batch processing and stream processing has same micro-architectural behavior in Spark if the difference between two implementations is of micro-batching only. If the input data rates are small, stream processing workloads are front-end bound. However, the front end bound stalls are reduced at larger input data rates and instruction retirement is improved. Moreover, Spark workloads using DataFrames have improved instruction retirement over workloads using RDDs.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations