A scalable hybrid architecture for high performance data-parallel applications

2017 
This paper presents a scalable hybrid architecture for high performance data-parallel applications on tightly coupled shared-memory CPU-FPGA systems such as the Xilinx Zynq SoC. The aims of the proposed architecture are: 1)to simplify the development of hardware acceleration for dataparallel applications; 2)to reach the performance limit caused by memory access and/or hardware resource available on an FPGA; 3)to reduce the overhead caused by task scheduling and device drivers. The proposed architecture can be used as a generic template to implement data-parallel applications. Each task in an application is mapped to one hardware accelerator, which is called “kernel”. Several identical instances of each hardware kernel execute concurrently to provide parallelism. By deploying the maximum number of instances of the hardware kernel, we make full use of the bandwidth of memory access and the resources available on the FPGA. In order to improve performance further, task scheduling and device drivers are implemented as a hardware scheduler called DmaScheduler on FPGA hardware. Experimental results show 2.93x-51.25x speedup on Zynq FPGA for applications of image processing, Black Scholes option pricing, matrix multiplication and clustering algorithm, compared with existing FPGA implementations.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    0
    Citations
    NaN
    KQI
    []