An instrumentation framework for performance analysis of Halide schedules

2021 
Abstract This work extends Halide with a profiling API to measure events supported by the target processor during the application runtime. We demonstrate how developers can use this extension to profile application loop levels, functions’ producer and consumer relations, and threads on parallel regions. We also show that the extension is library agnostic, hence developers can choose the profiling library that best suits their environment. As a case study we measure data traffic, number of flops and clock-cycles per instruction on x86 processors, and discuss how the reported results can be used to detail the performance aspects and improve Halide schedules.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    0
    Citations
    NaN
    KQI
    []