Programmable data parallel accelerator for mobile computer vision

2015 
The demand for high performance yet extremely low-power multimedia accelerators for mobile communication is ever growing. To meet this challenge a novel approach with a very low-power programmable TTA processor is proposed in this paper. The processor is benchmarked with two OpenCL computer vision applications; depth estimation and face detection. The former is an excellent example of a highly parallel algorithm that suits our TTA processor extremely well whereas the latter is an example of a more serial algorithm that poses a challenge for GPU-style parallel platforms. Both algorithms are also implemented and optimized for a high throughput AMD Radeon HD 7750 GPU, Qualcomm Adreno 330 mobile GPU and Intel Core i5-480M for a fair comparison of performance and energy efficiency. These platforms are chosen because they all can be programmed with OpenCL with equivalent programming efforts. In this paper we show that our novel approach can achieve real-time requirements and easily outperform both GPUs as well as the CPU in terms of throughput per watt criterion, making it an excellent candidate for power-constrained mobile platforms.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    31
    References
    1
    Citations
    NaN
    KQI
    []