A Fast and Power-Efficient Hardware Architecture for Non-Maximum Suppression

2019 
Non-maximum suppression (NMS) is an indispensable post-processing step in face detection. The vast majority of face detection methods need NMS to merge the candidate detected face boxes that belong to the same face. However, the standard NMS is a greedy and local optimization technique which suffers from several shortcomings, such as high complexity ( ${O}$ ( ${N} ^{\boldsymbol {2}}$ )), high latency, and large power consumption. This brief alleviates these problems and presents an efficient hardware architecture for NMS, meanwhile, carries out the optimization for the calculation unit to achieve the reduction of area accordingly. Based on the multi-thread computing, this brief utilizes sliding window to obtain parallelism and uses position-based bit table technique for the enhancement of data accessing and data reusing, which greatly decreases the cost of memory access and power consumption. The proposed hardware architecture is implemented in TSMC 28-nm technology. Experiments show that the power consumption is 6.142 mW and the latency is 12.79 $ {\mu }\text{s}$ to cluster 1000 candidate boxes, whose energy efficiency is higher than those state-of-the-art methods by $3798 {\times }$ and $358 {\times }$ , respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    5
    Citations
    NaN
    KQI
    []