Efficient Processing of Sparse Tensor Decomposition via Unified Abstraction and PE-interactive Architecture

2021 
We propose a novel architecture to efficiently perform sparse tensor decomposition (SpTD). As the generalization of vectors and matrices, tensors are widely used to process high-dimensional data. SpTD is not only an emerging tensor analysis technique but also an effective tool to reduce the storage and computation costs of tensors. However, conventional general-purpose processors are inefficient to perform SpTD, mainly due to: i) variable sparsity degree and flexible buffer size requirement; ii) difficulties of fusing multiple execution kernels to pursue better performance. For domain-specific accelerator designers on the other hand, the diversity of decomposition algorithms is also an important problem that must be considered. We propose a unified abstraction for SpTD algorithms and design a specialized accelerator. First, we formulate two types of core kernels (SpLrMM and LrSampling) that serve as a standard form to fit a broad range of SpTD algorithms. Second, we design a sparse tensor engine (STE) to efficiently perform SpTD. STE uses a processing element (PE)-interactive architecture where PEs can be flexibly grouped together via Network-on-Chip to share the buffer capacity, bandwidth, and compute resources. We evaluate our accelerator with extensive experiments, and it achieves an average speedup of 45x over CPU and 29x over GPU.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []