ParalOS: A Scheduling & Memory Management Framework for Heterogeneous VPUs

2021 
Embedded systems are presented today with the challenge of a very rapidly evolving application diversity followed by increased programming and computational complexity. Customised heterogeneous System-on-Chip (SoC) processors emerge as an attractive HW solution in various application domains, however, they still require sophisticated SW development to provide efficient implementations at the expense of slower adaptation to algorithmic changes. In this context, the current paper proposes a framework for accelerating the SW development of computationally intensive applications on Vision Processing Units (VPUs), while still enabling the exploitation of their full HW potential via low-level kernel optimisations. Our framework is tailored for heterogeneous architectures and integrates a dynamic task scheduler, a novel scratchpad memory management scheme, I/O & inter-process communication techniques, as well as a visual profiler. We evaluate our work on the Intel Movidius Myriad VPUs using synthetic benchmarks and real-world applications, which vary from Convolutional Neural Networks (CNNs) to computer vision algorithms. In terms of execution time, our results range from a limited ~8% performance overhead vs optimised CNN programs to 4.2× performance gain in content-dependent applications. We achieve up to 33% decrease in scratchpad memory usage vs well-established memory allocators and up to 6× smaller inter-process communication time.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    0
    Citations
    NaN
    KQI
    []