Massive Small File Storage Scheme Based on Association Rule Mining

2021 
When HDFS stores a large number of small files, the NameNode will have insufficient memory space and high read latency, making NameNode a system bottleneck and seriously affecting the file processing capabilities and user experience of HDFS. This article proposes a solution to solve these two problems. HDFSA (Hadoop distributed file system based on association rule mining), a storage architecture based on association rule mining, uses the redis cluster to cache small files and merge and upload them to the DataNode and then uses association rule mining algorithms to preread the strongly associated files when downloading small files to improve the efficiency of reading massive small files. This experiment compares the system's memory occupancy rate with upload and download time indicators, and the results show that the proposed storage strategy reduces the memory occupancy of the NameNode, improves the efficiency of reading a large number of small files, and reduces the read latency of HDFS.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    4
    References
    0
    Citations
    NaN
    KQI
    []