HW3C: A Heuristic based Workload Classification and Cloud Configuration Approach for Big Data Analytics

2018 
It is a big challenge to pick up the best cloud configuration for recurring big data analytics jobs running in clouds. Prior efforts may get in a sub-optimal configuration due to a broad spectrum of cloud configurations with a few test runs, such as CherryPick. We present HW3C which is a heuristic based workload classification and cloud configuration system for big data analytics jobs, our insight is classifying a job by comparing its resource preference and usage informantion with other jobs, and then using heuristic rules to distinguish bad samples from good ones in Bayesian Optimization algorithm. Our experiments on HiBench and SparkBench in Aliyun ECS show that the performance of job had been improved by 53% in average comparing with CherryPick, meanwhile the resource cost had been reduced by 40% in average.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    1
    Citations
    NaN
    KQI
    []