Fleet: A Framework for Massively Parallel Streaming on FPGAs

James J. Thomas,Pat Hanrahan,Matei Zaharia

Fleet: A Framework for Massively Parallel Streaming on FPGAs

2020

We present Fleet, a framework that offers a massively parallel streaming model for FPGAs and is effective in a number of domains well-suited for FPGA acceleration, including parsing, compression, and machine learning. Fleet requires the user to specify RTL for a processing unit that serially processes every input token in a stream, a far simpler task than writing a parallel processing unit. It then takes the user's processing unit and generates a hardware design with many copies of the unit as well as memory controllers to feed the units with separate streams and drain their outputs. Fleet includes a Chisel-based processing unit language. The language maintains Chisel's low-level performance control while adding a few productivity features, including automatic handling of ready-valid signaling and a native and automatically pipelined BRAM type. We evaluate Fleet on six different applications, including JSON parsing and integer compression, fitting hundreds of Fleet processing units on the Amazon F1 FPGA and outperforming CPU implementations by over 400x and GPU implementations by over 9x in performance per watt while requiring a similar number of lines of code.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations