A parallel clustering algorithm for logs data based on Hadoop platform

Jiuyuan Huo,Jian Weng,Hong Qu

A parallel clustering algorithm for logs data based on Hadoop platform

2019

Jiuyuan Huo
Jian Weng
Hong Qu

Log analysis is an important method to reflect the running status and user behavior of the network system, and is also an important way to ensure network security. In view of the fact that the storage or calculation of log data by a single host can not meet the requirements of large-scale data analysis, this paper proposes a clustering method of big data based on Map/Reduce distributed computing framework for Web logs. The experiments are taken on the Hadoop platform. The relations and rules that exist in the logs are examined and analyzed to obtain the potential information. This method can enable efficient storage, management, and mining analysis for the large-scale Web logs.

Keywords:

Data mining
Cluster analysis
Computer science
Network security
log data
Big data
map reduce

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations