Decoupling the Multi-Rate Dataflow Execution in Coarse-Grained Reconfigurable Array

2020 
Coarse-grained reconfigurable array (CGRA) driven by dataflow execution is gaining reviving interest as an accelerator architecture of higher energy efficiency. However, with wider adoption in a variety of applications, it is facing complex data and control flows that cause multi-rate execution across different dataflow graphs in CGRAs which degrades the performance. In this paper, we propose a unified storage structure to decouple the multi-rate dataflow for decoupled execution. The structure leverages small distributed buffers with lightweight control. By chaining or aligning these buffers to form larger storage with different control schemes, it caters for different needs of dataflow decoupling when kernels are mapped onto CGRAs. Our experiment results show that by applying the proposed structure in conventional CGRAs, it can save dozens of PEs for dataflow computing, and improve the CGRA performance by an average of 2.53× for applications from different domains. Therefore, we provide a more efficient CGRA design when facing multi-rate dataflow execution.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    1
    Citations
    NaN
    KQI
    []