Enhancing Leakage Prevention for MapReduce

2022 
When public clouds become the platform of choice for MapReduce processing, users are placing higher demands on the privacy of the job data and program. A number of solutions employed trusted hardware to protect MapReduce tasks. However, existing works pointed out that simply protecting individual nodes in the MapReduce cluster with trusted hardware and protecting cross-node communication with encryption still leak information from side-channels. Specifically, attackers can derive data information by observing and manipulating cross-node communication traffic volumes. Although existing works proposed some solutions to prevent such leakage, in this paper, we show that previous solutions still leak critical job information. Additionally, our study shows that previous solutions have limitations from other aspects, including data restriction, partition function restriction, reliability issue, and high overheads. To address all the discovered limitations, we introduced the Strong Shuffle solution. Our analysis and experimental results showed that our solution has reduced the information leakage and addressed other discovered limitations. To support Strong Shuffle, we proposed a variant Bloom Filter, named Group-based Dynamic Bloom Filter (GDBF). Our theoretical analysis showed that GDBF has lower performance and storage overhead than the traditional Scalable Bloom Filter.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []