P-DOT: A model of computation for big data

2013 
In response to the high demand of big data analytics, several programming models on large and distributed cluster systems have been proposed and implemented, such as MapRe-duce, Dryad and Pregel. However, compared with high performance computing areas, the basis and principles of computation and communication behavior of big data analytics is not well studied. In this paper, we review the current big data computational model DOT and DOTA, and propose a more general and practical model p-DOT (p-phases DOT). p-DOT is not a simple extension, but with profound significance: for general aspects, any big data analytics job execution expressed in DOT model or BSP model can be represented by it; for practical aspects, it considers I/O behavior to evaluate performance overhead. Moreover, we provide a cost function implying that the optimal number of machines is near-linear to the square root of input size for a fixed algorithm and workload, and demonstrate the effectiveness of the function through several experiments.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []