Extracting memory-level parallelism through reconfigurable hardware traces

2013 
This paper proposes a new FPGA-based embedded computer architecture, which focuses on how to construct an application-specific memory access network capable of extracting the maximum amount of memory-level parallelism on a per-application basis. Specifically, through performing dynamic memory analysis and utilizing the capabilities of modern FPGA devices: abundant distributed block RAMs and programmability, the proposed reconfigurable architecture synthesizes highly efficient accelerators that enable parallelized memory accesses, and therefore accomplish effective data orchestration by maximally extracting the target application's instruction, loop and memory-level parallelism. To validate our proposed architecture, we implemented a baseline embedded processor platform, a conventional CPU +accelerator with a centralized single memory, and a prototype based on Xilinx MicroBlaze technology. Our experimental results have shown that on average for 5 benchmark applications from SPEC2006 and MiBench [1], our proposed architecture achieves 8.6 times speedup compared to the baseline embedded processor platform and 1.7 times speedup compared to a conventional CPU+accelcrator platform. More interestingly, the proposed platform achieves more than 40% reduction in energy-delay product compared to a conventional CPU+accelerator with a centralized memory.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    1
    Citations
    NaN
    KQI
    []