A parallel clustering algorithm for logs data based on Hadoop platform

2019 
Log analysis is an important method to reflect the running status and user behavior of the network system, and is also an important way to ensure network security. In view of the fact that the storage or calculation of log data by a single host can not meet the requirements of large-scale data analysis, this paper proposes a clustering method of big data based on Map/Reduce distributed computing framework for Web logs. The experiments are taken on the Hadoop platform. The relations and rules that exist in the logs are examined and analyzed to obtain the potential information. This method can enable efficient storage, management, and mining analysis for the large-scale Web logs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    2
    Citations
    NaN
    KQI
    []