RIOS: Runtime Integrated Optimizer for Spark

2018 
Many Data-Intensive Scalable Computing (DISC) systems do not support sophisticated cost-based query optimizers because they lack the necessary data statistics. Consequently many crucial optimizations, such as join order and plan selection, are not well supported in DISC systems. RIOS is a Runtime Integrated Optimizer for Spark that lazily binds to execution plans at runtime, after collecting the statistics needed to make more optimal decisions. We evaluate the efficacy of our approach and show that better plans can be derived at runtime, achieving more than an order-of-magnitude performance improvement compared to compile time generated plans produced by the Apache Spark rule-base optimizer.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    46
    References
    10
    Citations
    NaN
    KQI
    []