STREAMSCOPE: continuous reliable distributed processing of big data streams

2016 
STREAMSCOPE (or STREAMS) is a reliable distributed stream computation engine that has been deployed in shared 20,000-server production clusters at Microsoft. STREAMS provides a continuous temporal stream model that allows users to express complex stream processing logic naturally and declaratively. STREAMS supports business-critical streaming applications that can process tens of billions (or tens of terabytes) of input events per day continuously with complex logic involving tens of temporal joins, aggregations, and sophisticated user-defined functions, while maintaining tens of terabytes in-memory computation states on thousands of machines. STREAMS introduces two abstractions, rVertex and rStream, to manage the complexity in distributed stream computation systems. The abstractions allow efficient and flexible distributed execution and failure recovery, make it easy to reason about correctness even with failures, and facilitate the development, debugging, and deployment of complex multi-stage streaming applications.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    40
    References
    58
    Citations
    NaN
    KQI
    []