Performance evaluation of cloud-based log file analysis with Apache Hadoop and Apache Spark

2017 
We focus on three performance indicators, the execution time, resource utilization and scalability.We conducted realistic log file analysis experiments in both frameworks.We proposed a power consumption model and an utilization-based cost estimation.We experimentally confirmed Sparks best performance. Log files are generated in many different formats by a plethora of devices and software. The proper analysis of these files can lead to useful information about various aspects of each system. Cloud computing appears to be suitable for this type of analysis, as it is capable to manage the high production rate, the large size and the diversity of log files. In this paper we investigated log file analysis with the cloud computational frameworks ApacheHadoop and Apache Spark. We developed realistic log file analysis applications in both frameworks and we performed SQL-type queries in real Apache Web Server log files. Various experiments were performed with different parameters in order to study and compare the performance of the two frameworks.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    60
    References
    87
    Citations
    NaN
    KQI
    []