A Simulator for Intelligent Workload Managers in Heterogeneous Clusters

2021 
Modern High Performance Computing (HPC) clusters often comprise a huge amount of computing resources of different capabilities, making them heterogeneous and difficult to manage. In addition, they must deal with a wide range of applications with different requirements. All this poses a great challenge to the workload managers that assign applications to resources. There are many new proposals to overcome this challenge, including some that employ Deep Reinforcement Learning (DRL) techniques. This paper proposes a novel simulation framework for the study of workload managers, that has been conceived to foster the study of workload managers based on DRL techniques. Its main features include the simulation of heterogeneous clusters based on multicore architectures, taking into account the contention in shared memory access and the energy consumption. A validation of the accuracy and performance of the simulator was made, compared with a real environment based on Slurm. This shows good accuracy of the results, with a relative error below 5% in makespan and 10% in energy consumption, and speedups up to 200.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    0
    Citations
    NaN
    KQI
    []