Efficient Jobs Dispatching In Emerging Clouds

Authors:
Shimon Bitton Technion, Israel
Yuval Emek Technion, Israel
Shay Kutten Technion, Israel

Abstract:

This study was carried in the context of the development of technologies for a cloud that uses an optical network for internal communication. The problem addressed in this paper deals with dispatching jobs-units of work, to be performed by machines on the cloud. Sending (or migrating) a job to a machine involves establishing a lightpath (a la circuit switching); this incurs a significant setup cost, but while it exists, the lightpath's capacity is very high. Hence, moving one job is about as expensive as moving a set of jobs over the same lightpath. Our goal is to develop online network dispatching algorithms for a work conserving job scheduling. That is, consider a set of jobs the dispatcher is responsible for their executions on some set SM of machines. Any machine in the network may join or (when not executing a job) leave SM according to decisions made outside the scope of this paper. Whenever a machine joins the set or is in the set and has just finished executing a job, it issues a request for a new job to perform and the dispatcher must send this machine a job that has not been executed yet (if such exists). Every machine can perform any of the jobs, and each job is performed on a single machine. The main algorithmic challenge in this context boils down to the following questions: How many jobs should we send to a requesting machine (or to some intermediate storage to be distributed from there)? From the storage on which machine should these jobs be taken? The algorithms developed here are shown to be efficient in reducing the costs of establishing lightpaths. As opposed to related algorithms for delivering consumable resources (in other contexts), we prove that our online algorithms are fully competitive. We present randomized online algorithms for two different settings: in the first it is assumed that each message requires establishing a lightpath and thus, incurs a setup cost; in the second, we distinguish between messages that carry job sets and small control messages sent by the algorithm, where the latter type of messages is assumed to be sent over a designated (non-optical) control plane at a negligible cost. Our algorithms are quite simple, though the analysis turns out to be rather involved. They are designed (and rigorously analyzed) for a general architecture, but would be especially efficient in fat tree architectures-the common choice in many data centers. Remark. Due to space limitations, some proofs are omitted from this version of the paper. Refer to [13] for a full version that includes all missing proofs along with some additional material. (For ease of reference, theorem and section numbers in the current version of the paper correspond to their counterparts in the full version.)

You may want to know: