MemEFS: An Elastic In-memory Runtime File System for eScience Applications

2015 
Data-intensive scientific workflows exhibit inter-task dependencies that generate file-based communication schemes. In such scenarios, traditional disk-based storage systems often limit overall application performance and scalability. To overcome the storage bottleneck, in-memory runtime distributed file systems speed up application I/O. Such systems are deployed statically onto a fixed number of compute nodes and act as a distributed, fast I/O cache for the runtime generated data. Such static deployment schemes have two major drawbacks. First, the user is faced with the sometimes difficult task of estimating the size of the generated data, as the application would fail otherwise. Second, because applications exhibit significant variability of the data footprint and of the achieved parallelism during their runtime, this deployment scheme also leads to severe resource under-utilization. To address these limitations, we present MemEFS, an elastic in-memory runtime distributed file system. MemEFS is able to scale elastically, based on application storage demands, by acquiring or releasing resources when needed. Our evaluation shows that, while generating modest runtime overheads, MemEFS is able to increase the resource utilization efficiency by up to 65%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    31
    References
    7
    Citations
    NaN
    KQI
    []