P-DOT: a model of computation for big data

2016 
In response to the high demand of big data analytics, several programming models on large and distributed cluster systems have been proposed and implemented, such as MapReduce, Dryad and Pregel. However, compared with high performance computing areas, the basis and principles of computation and communication behaviour of big data analytics is not well studied. In this paper, we review the current big data computational model DOT and DOT Advanced, and propose a more general and practical model p-DOT p-phases DOT. p-DOT is not a simple extension, but with profound significance: for general aspects, any big data analytics job execution expressed in DOT model or bulk synchronous parallel model can be represented by it; for practical aspects, it considers I/O behaviour to evaluate performance overhead. Moreover, we provide a cost function of p-DOT implying that the optimal number of machines is near-linear to the square root of input size for a fixed algorithm and workload, and certify that the processing paradigm of p-DOT is scalable and fault-tolerant. Finally, we demonstrate the effectiveness of the model through several experiments.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    1
    Citations
    NaN
    KQI
    []