XData: A General-purpose Unified Processing System for Data Analysis and Machine Learning

2021 
With the rapid development and widespread application of data science, users need to mine value from large amounts of data. Currently, many new data processing systems have been proposed and used in practice. Each system usually targets a specialized domain, but in a real data analysis scenario, it is hard to separate different techniques and we often need unifying multiple processing. This paper proposes XData, a new general-purpose unified processing system for data analysis and machine learning. It is designed to support a variety of operators in a uniform statement, such as data retrieval, data aggregation, and modeling of machine learning. With the mechanism of pipeline and the data processing language (DPL), developers can easily combine multiple processing with SQL-like form to implement end-to-end applications. It also takes advantage of the intermediate results in the pipeline, and automatically optimizes the performance on the logical plan, physical topology, and task executor level. It has been put into use in practical engineering and the evaluations reveal that performance can be improved compared to traditional processing techniques.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    0
    Citations
    NaN
    KQI
    []