Interactive Entity Centric Analysis of Log Data

2017 
Interactive entity centric analysis of log data can help us gain fine granularity insights on business. In this paper, firstly we describe a fiber based partitioning method for log data, which accelerate later entity centric analysis. Secondly, we present our fiber based partitioner which is used by Spark SQL query engine. Fiber based partitioner takes locations of data blocks into account when loading data from HDFS into RDD, and when shuffling data from upstream operators to downstream operators during joining, avoids data interchange between node and speeds up query processing. Finally, we present our experiment results which demonstrates that fiber based partitioner improve entity centric queries.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    1
    References
    0
    Citations
    NaN
    KQI
    []