An online log template extraction method based on hierarchical clustering

2019 
The raw log messages record extremely rich system, network, and application running dynamic information that is a good data source for abnormal detection. Log template extraction is an important prerequisite for log sequence anomaly detection. The problems of the existing log template extraction methods are mostly offline, and the few online methods have insufficient F1-score in multi-source log data. In view of the shortcomings of the existing methods, an online log template extraction method called LogOHC is proposed. Firstly, the raw log messages are preprocessed, and the word distributed representation (word2vec) is used to vectorize the log messages online. Then, the online hierarchical clustering algorithm is applied, and finally, log templates are generated. The experimental analysis shows that LogOHC has a higher F1-score than the existing log template extraction methods, is suitable for multi-source log data sets, and has a shorter single-step execution time, which can meet the requirements of online real-time processing.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    38
    References
    5
    Citations
    NaN
    KQI
    []