Shared Execution Techniques for Business Data Analytics over Big Data Streams.

2020 
Business Data Analytics require processing of large numbers of data streams and the creation of materialized views in order to provide near real-time answers to user queries. Materializing the view of each query and refreshing it continuously as a separate query execution plan is not efficient and is not scalable. In this paper, we present a global query execution plan to simultaneously support multiple queries, and minimize the number of input scans, operators, and tuples flowing between the operators. We propose shared-execution techniques for creating and maintaining materialized views in support of business data analytics queries. We utilize commonalities in multiple business data analytics queries to support scalable and efficient processing of big data streams. The paper highlights shared execution techniques for select predicates, group, and aggregate calculations. We present how global query execution plans are run in a distributed stream processing system, called INGA which is built on top of Storm. In INGA, we are able to support online view maintenance of 2500 materialized views using 237 queries by utilizing the shared constructs between the queries. We are able to run all 237 queries using a single global query execution plan tree with depth of 21.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []