A comprehensive study of deep learning compiler bugs

2021 
There are increasing uses of deep learning (DL) compilers to generate optimized code, boosting the runtime performance of DL models on specific hardware. Like their traditional counterparts, DL compilers can generate incorrect code, resulting in unexpected model behaviors that may cause catastrophic consequences in mission-critical systems. On the other hand, the DL models processed by DL compilers differ fundamentally from imperative programs in that the program logic in DL models is implicit. As such, various characteristics of the bugs arising from traditional compilers need to be revisited in the context of DL compilers. In this paper, we present the first systematic study of DL compiler bugs by analyzing 603 bugs arising in three popular DL compilers (i.e., TVM from Apache, Glow from Facebook, and nGraph from Intel). We analyzed these bugs according to their root causes, symptoms, and the stages where they occur during compilation. We obtain 12 findings, and provide a series of valuable guidelines for future work on DL compiler bug detection and debugging. For example, a large portion (nearly 20%) of DL compiler bugs are related to types, especially tensor types. The analysis of these bugs helps design new mutation operators (e.g., adding type cast for a tensor to promote implicit type conversion in subsequent tensor computations) to facilitate type-related bug detection. Further, we developed TVMfuzz as a proof-of-concept application of our findings to test the TVM DL compiler. It generates new tests based on TVM's original test suite. They expose 8 TVM bugs that are missed by the original test suite. The result demonstrates the usefulness of our findings.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    52
    References
    2
    Citations
    NaN
    KQI
    []