Fast Simulation of a Many-NPU Network-on-Chip for Microarchitectural Design Space Exploration

2021 
A viable solution to cope with the ever-increasing computation complexity of deep learning applications is to integrate many neural processing units (NPUs) in a chip where a network-on-chip (NoC) is used as the communication fabric. Since the design space of an NoC is huge, the network topology is first selected based on the communication patterns of applications with a high-level performance estimation method. After the network topology is selected, the microarchitectural design space exploration is performed with a cycle-level NoC simulator. However, the existing NoC simulator is so slow that design space exploration of the microarchitecture is usually conducted manually in a narrow space. Since a synthetic trace is used, the simulation accuracy is also limited. To overcome these weak-nesses, we present a simulation technique that is fast and accurate enough for microarchitectural design space of an NoC. In the proposed technique, we use the real communication trace from the many-NPU simulation without NoC consideration. To this end, we define the trace format that defines the interface between a many-NPU simulator and the NoC simulator. To accelerate simulation speed, we propose a parallelization technique at the cluster level in the simulation of the hierarchical NoC. The key technique is to manage the timestamps of events at the cluster boundary to do without time synchronization error. And, we adjust the abstraction level of simulation models to reduce the number of modules in the SystemC NoC simulation. With the proposed technique, we could achieve up to 40 times speed-up for 32 NPU system, compared with the FlexNoC simulator.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    0
    Citations
    NaN
    KQI
    []