Monitoring Aspects of Cloud over the Big Data Analytics Using the Hadoop for Managing Short Files

2015 
This paper presents a review study on cloud computing and the big data analytics using the hadoop. Hadoop is an open source tool used for data storage of unstructured data. Hadoop can also be defined as the engineering part of big data which is only a predictive analysis and it is mainly used for processing and analysis of data. It has mainly two core components: HDFS (Hadoop distributed file system) which stores large amount of data in a reliable manner and another one is Map Reduce which is a function used for parallel processing of data. Hadoop does not perform well for short files as a large number of short files pose a heavy burden on the Name Node of HDFS and an increase in execution time for Map Reduce is encountered. Hadoop is designed to handle large size files and hence suffers a performance penalty while dealing with large number of short files. This research work gives an introduction about HDFS, short file problem and existing ways to deal with it. Now a day’s storage is not a big issue, the issue is how we can make sense of data and how to explain to the industry that our cloud is safe.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []