Heterogeneous parallel_for Template for CPU–GPU Chips

2019 
Heterogeneous processors, comprising CPU cores and a GPU, are the de facto standard in desktop and mobile platforms. In many cases it is worthwhile to exploit both the CPU and GPU simultaneously. However, the workload distribution poses a challenge when running irregular applications. In this paper, we present LogFit, a novel adaptive partitioning strategy for parallel loops, specially designed for applications with irregular data accesses running on heterogeneous CPU–GPU architectures. Our algorithm dynamically finds the optimal chunk size that must be assigned to the GPU. Also, the number of iterations assigned to the CPU cores are adaptively computed to avoid load unbalance. In addition, we also strive to increase the programmer’s productivity by providing a high level template that eases the coding of heterogeneous parallel loops. We evaluate LogFit’s performance and energy consumption by using a set of irregular benchmarks running on a heterogeneous CPU–GPU processor, an Intel Haswell. Our experimental results show that we outperform Oracle-like static and other dynamic state-of-the-art approaches both in terms of performance, up to 57%, and energy saving, up to 31%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    13
    Citations
    NaN
    KQI
    []