|Xiaoyang Zhang||Huazhong University of Science and Technology, P.R. China|
|Yuchong Hu||Huazhong University of Science and Technology, P.R. China|
|Patrick Pakching Lee||The Chinese University of Hong Kong, Hong Kong|
|Pan Zhou||Huazhong University of Science and Technology, P.R. China|
To adapt to the increasing storage demands and varying storage redundancy requirements, practical distributed storage systems need to support storage scaling by relocating currently stored data to different storage nodes. However, the scaling process inevitably transfers substantial data traffic over the network. Thus, minimizing the bandwidth cost of the scaling process is critical in distributed settings. In this paper, we show that optimal storage scaling is achievable in erasure-coded distributed storage based on network coding, by allowing storage nodes to send encoded data during scaling. We formally prove the information-theoretically minimum scaling bandwidth. Based on our theoretical findings, we also build a distributed storage system prototype NCScale, which realizes network-coding-based scaling while preserving the necessary properties for practical deployment. Experiments on Amazon EC2 show that the scaling time can be reduced by up to 50% over the state-of-the-art.