SAQP++: Bridging the Gap between Sampling-Based Approximate Query Processing and Aggregate Precomputation

2018 
In booming Big Data era, interactive data analytics is becoming a regular demand for data scientists. With the continuous growing of the data, timely response for aggregation queries is becoming increasingly challenging. To address this challenge, scientists have proposed two separate methods: sampling-based approximate query processing (SAQP) and aggregate precomputation (Materialization) such as data cubes. In this paper, we propose a novel framework: SAQP++, which combines sampling and precomputed aggregate together to reach the goal of both relative accurate results and acceptable preparation time. Using SUM aggregate function as the example, we propose an optimal solution of materializing under uniform distribution, and a hill climbing based algorithm of materializing under non-uniform distribution, respectively. Our experiments show that SAQP++ achieves a more flexible and better trade-off among preprocessing cost, query response time, and answer quality than SAQP or Materialization alternatives.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    0
    Citations
    NaN
    KQI
    []