Towards Automating Multi-dimensional Data Decomposition for Executing a Single-GPU Code on a Multi-GPU System

2016 
In this paper, we present a data decomposition method for multi-dimensional data, aiming at realizing multi graphics processing unit (GPU) acceleration of a compute unified device architecture (CUDA) code written for a single GPU. Our multi-dimensional method extends a previous method that deals with one-dimensional (1-D) data. The method performs a sample run of selected GPU threads to decompose large data into small segments, which avoid exhaustion of GPU memory. As compared with the previous method, our multidimensional method produces smaller segments, so that it saves GPU memory consumption and reduces the amount of CPU-GPU data transfer. As a result of experiments using matrix multiplication, the presented method consumed less GPU memory compared with that of the previous method, and thereby successfully processed 29 times larger matrices as long as the matrices fit into CPU memory. However, we found that index transformation needed for multi-dimensional decomposition dropped the effective performance by 28%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    4
    Citations
    NaN
    KQI
    []