Optimized Parallel Implementation of Face Detection Based on Embedded Heterogeneous Many-Core Architecture

2017 
Computing performance is one of the key problems in embedded systems for high-resolution face detection applications. To improve the computing performance of embedded high-resolution face detection systems, a novel parallel implementation of embedded face detection system was established based on a low power CPU-Accelerator heterogeneous many-core architecture. First, a basic CPU version of face detection prototype was implemented based on the cascade classifier and Local Binary Patterns operator. Second, the prototype was extended to a specified embedded parallel computing platform that is called Parallella and consists of Xilinx Zynq and Adapteva Epiphany. Third, the face detection algorithm was optimized to adapt to the Parallella architecture to improve the detection speed and the utilization of computing resources. Finally, a face detection experiment was conducted to evaluate the computing performance of the proposal in this paper. The experimental results show that the proposed implementation obtained a very consistent accuracy as that of the dual-core ARM, and achieved 7.8 times speedup than that of the dual-core ARM. Experiment results prove that the proposed implementation has significant advantages on computing performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    3
    Citations
    NaN
    KQI
    []