Evaluating Performance Tradeoffs on the Radeon Open Compute Platform

2018 
GPUs have been shown to deliver impressive computing performance, while also providing high energy efficiency, across a wide range of high-performance and embedded system workloads. However, limited support for efficient communication and synchronization between the CPU and the GPU impacts our ability to fully exploit the benefits of heterogeneous systems. Recently, the Heterogeneous System Architecture (HSA) was introduced to address these issues with synchronization and communication, but given the low-level nature of HSA, it was not easily adopted by the broader programming community. In 2016, AMD described the Radeon Open Compute (ROC) platform that brings high-level programming frameworks such as OpenCL, HC++, and HIP to end users. These high-level programming frameworks offer a simpler programming experience by wrapping complex HSA APIs, while still delivering the power of HSA. To date, there has been little evaluation of the potential performance benefits and trade-offs of leveraging the ROC platform. In this work, we evaluate the performance of the ROC platform using the Hetero-Mark and DNNMark benchmark suites. Equipped with Hetero-Mark, we compare the performance of different programming frameworks, including OpenCL, HC++, and HIP on both integrated APUs and discrete GPUs. We also present three new CPU-GPU collaborative patterns and employ three new benchmarks to evaluate system-level atomics. With DNNMark and a new DNN Face Detection benchmark, we evaluate the performance of ROC libraries including rocBLAS and MIOpen. We also provide guidance on best practices to programmers when developing applications leveraging the ROC platform.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    22
    Citations
    NaN
    KQI
    []