Niijima: sound and automated computation consolidation for efficient multilingual data-parallel pipelines

2019 
Multilingual data-parallel pipelines, such as Microsoft's Scope and Apache Spark, are widely used in real-world analytical tasks. While the involvement of multiple languages (often including both managed and native languages) provides much convenience in data manipulation and transformation, it comes at a performance cost --- managed languages need a managed runtime, incurring much overhead. In addition, each switch from a managed to a native runtime (and vice versa) requires marshalling or unmarshalling of an ocean of data objects, taking a large fraction of the execution time. This paper presents Niijima, an optimizing compiler for Microsoft's Scope/Cosmos, which can consolidate C#-based user-defined operators (UDOs) across SQL statements, thereby reducing the number of dataflow vertices that require the managed runtime, and thus the amount of C# computations and the data marshalling cost. We demonstrate that Niijima has reduced job latency by an average of 24% and up to 3.3x, on a series of production jobs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    45
    References
    1
    Citations
    NaN
    KQI
    []