A Domain-Specific Architecture for Accelerating Sparse Matrix Vector Multiplication on FPGAs

2020 
FPGAs allow custom memory hierarchy and flexible data movement with highly fine-grained control. These capabilities are critical for building high performance and energy efficient domain-specific architectures (DSAs), especially for workloads with irregular memory access and data-dependent communication patterns. Sparse linear algebra operations, especially sparse matrix vector multiplication (SpMV), are examples of such workloads and are becoming important due to their use in numerous areas of science and engineering. Existing FPGA-based DSAs for SpMV do not allow customization through plug and play of the building blocks. For example, most of these DSAs require switching network/crossbar architecture as a building block for routing matrix data to banked vector memory blocks. In this paper, we first present an approach where a custom network is built using simple blocks arranged in a regular fashion to exploit low-level architecture details. Further, we make use of this network to replace expensive crossbars employed in GEMX SpMV engine and develop an end-to-end tool-flow around mixed IP approach (HLS/RTL). Due to the modularity of our design, our tool-flow allows us to insert an additional block in the design to guarantee zero-stall from the accumulation stage. On Alveo U200, we report performance numbers of up to 4.4 GFLOPS (92% peak bandwidth utilization) using our accelerator (attached with one DDR4).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    8
    Citations
    NaN
    KQI
    []