Reducing Fault-tolerant Overhead for Distributed Stream Processing with Approximate Backup

2020 
The stream processing model continuously processes online data in an on-pass fashion that can be more vulnerable to failures than other offline-data processing schemes. Checkpoint-based fault-tolerant methods have been widely used to enhance the reliability of stream processing systems. To ensure exact data recoveries upon failures, full-backup mechanisms are used to store a complete copy of data, which introduces substantial runtime overhead and increases output latency. In the meantime, a wide range of online processing applications prefer quick-and-dirty results with a slight degradation inaccuracy to delayed exact results. This paper introduces a novel approximate fault-tolerant problem (OAFP) with the objective of reducing the failure-free fault-tolerant overhead and ensuring user-defiled output accuracy requirement upon failure at the same time. We present an approximate fault-tolerant scheme based on sampling backup mechanism and study the trade-off between fault-tolerant overhead and output accuracy in stream processing systems. We proposed two algorithms to compute backup plans for both single-node failure and correlated failure scenarios. Extensive experiments with different types of stream topologies are conducted on our simulator to verify the correctness and effectiveness of our approach. We prove our solution guarantees the output accuracy requirement with minimum FT latency for directed acyclic graph (DAG) stream topologies with single-node failures.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    0
    Citations
    NaN
    KQI
    []