Применение стека технологий ELK для сбора и анализа системных журналов событий

2021 
Modern scientific research in many areas often requires the use of powerful computing systems and complex software systems to effectively solve research problems. Many scientific organizations build their own computing systems, an example of which is the cloud infrastructure of the Joint Institute for Nuclear Research. During the operation of such large computing systems, emergency situations and failures inevitably arise, the resolution of which is primarily based on the analysis of system event logs. As infrastructure grows in scale and complexity, event log analysis becomes a more complex process that requires additional tools to be used for its effective implementation in large-scale infrastructures. In this paper we share our experience in organizing and implementing a system for centralized collection and analysis of system event logs of the JINR cloud infrastructure. We chose the Elasticsearch, Logstash, Kibana (ELK) technology stack as the basis for the developed system, which is widely used to solve similar problems in many other large scientific computing infrastructures and has proven to be suitable both for solving problems of collecting and analyzing event logs of various systems, as well as a number of other problems in semi-structured and unstructured data analysis. On the example of the mechanism for ensuring the fault tolerance of the control nodes of the JINR cloud infrastructure we show that configurations of modern systems can have dynamic nature, complicating examination of system event logs, and how the developed system can be used to simplify their analysis in such situations.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []