The Case for Storage Optimization Decoupling in Deep Learning Frameworks

2021 
Deep Learning (DL) training requires efficient access to large collections of data, leading DL frameworks to implement individual I/O optimizations to take full advantage of storage performance. However, these optimizations are intrinsic to each framework, limiting their applicability and portability across DL solutions, while making them inefficient for scenarios where multiple applications compete for shared storage resources.We argue that storage optimizations should be decoupled from DL frameworks and moved to a dedicated storage layer. To achieve this, we propose a new Software-Defined Storage architecture for accelerating DL training performance. The data plane implements self-contained, generally applicable I/O optimizations, while the control plane dynamically adapts them to cope with workload variations and multi-tenant environments.We validate the applicability and portability of our approach by developing and integrating an early prototype with the TensorFlow and PyTorch frameworks. Results show that our I/O optimizations significantly reduce DL training time by up to 54% and 63% for TensorFlow and PyTorch baseline configurations, while providing similar performance benefits to framework-intrinsic I/O mechanisms provided by TensorFlow.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    55
    References
    0
    Citations
    NaN
    KQI
    []