Offloading Collective Operations to Programmable Logic

2017 
The authors describe their architecture and implementation for offloading collective operations to programmable logic in the communication substrate. Collective operations are widely used in parallel processing. Their design and implementation strategies affect the performance of many high-performance computing applications that utilize them. Collectives are central to the message passing interface (MPI) programming model. The programmable logic provided by field-programmable gate arrays (FPGAs) is a powerful option for creating task-specific logic to aid applications. The authors’ approach is applicable in scenarios where there is programmable logic in the communication pipeline and can be used to accelerate various network-based operations. In this article, the authors present a general collective offloading framework for use in applications using MPI. They evaluate their approach on the Xilinx Zynq system on a chip and an FPGA-based network interface card called the NetFPGA. Results are presented both from microbenchmarks and a benchmark scientific application using MPI.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    3
    Citations
    NaN
    KQI
    []