Applying Victim Cache in High Performance GPGPU Computing

2016 
Modern GPGPUs employ thousands of threads for parallel execution. The massive threads often compete in the small sized first level data (L1D) cache, which leads to severe cache thrashing problem and hurts the GPGPU performance. In this paper, we apply victim cache design into GPGPUs to alleviate L1D cache thrashing problem for better data locality and system performance. Instead of a small fully associative victim cache design, we first change the victim cache structure to meet needs from the large number of concurrent threads commonly in GPGPU applications. Then, we propose to use the unallocated registers determined by compiler to further provide storage for victim cache data. The experiment results show that using our approach, the on chip data cache hit rate can be increased largely, which leads to a better performance of 32.7% on average with only small changes to the GPGPU design.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []