Role of on-chip networks in building domain-specific architectures (DSAs) for sparse computations (invited)

2020 
DSAs for machine learning (ML) such as Google TPU, Microsoft Brainwave, Xilinx xDNN are becoming prominent because of high energy-efficiency and performance. These DSAs perform dense linear algebra efficiently by minimizing data movement, exploiting high data reuse, regular memory access pattern and data locality (temporal and spatial). DSAs for domains like graph analytics and HPC are emerging at at rapid pace as well where most of the computations revolve around sparse linear algebra, specifically Sparse Matrix Vector Multiplication (SpMV). SpMV refers to the multiplication of a sparse matrix A by a dense vector x to produce a result vector y. Designing high performance and energy efficient DSs for SpMV is challenging due to highly irregular and random memory access patterns, prro temporal and spatial locality and very low data reuse opportunites. SpMV DSs explorit distributed onchip memory blocks to store vector entries for avoiding off-chip random memory access. However, a switching network architecture or a crossbar is usually required as a building block for routing matrix no-zero elements to on-chip memory blockjs. In thus presentation, we will discuss about the network architectures and switches employed in most of the SpMV DSAs, our SpMV DSA based on 2D-mech network, design choices for the FPGA implementation of the DSA, and scalability aspects. In more detail for our use-case, we will highlight the importance and challenges of achieving energy efficient data movement using scalable on-chip network architectures.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []