Identifying HPC codes via performance logs and machine learning

2013 
We aim here to leverage supervised learning to enable large-scale analysis of performance logs, in order to accurately classify code runs and understand the importance of different performance metrics. Previous work has demonstrated structured communication patterns in high performance codes. By categorizing these patterns, we can identify what code was executed. The ability to identify a code by its performance profile is useful for specializing HPC security systems and for identifying common optimizations for similar codes. Supervised machine learning is used on an extensive set of data of real user runs from a high performance computing center. We employ and modify a rule ensemble method to predict what code was run given a performance log. This naive method achieves greater than 93% accuracy. When modified to allow an "other class," accuracy increases to greater than 97%. This modification allows an anomalous run to be flagged as not belonging to a previously seen, or acceptable, code and provides additional latitude in monitoring what is run on supercomputing facilities. We conclude by interpreting the resulting rule model, as it tells us which components of a code are most distinctive and useful for identification.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    10
    Citations
    NaN
    KQI
    []