A Speculative Parallel Optimization Method for Industrial Big Data Algorithms

2019 
Apache Spark is one of the most advanced distributed computing platforms at present, more and more industrial big data algorithms are implemented on this platform to achieve better performance. However, some of the algorithms cannot be executed on Apache Spark in parallel because of their complex inner control dependencies. In order to solve the problem, by introducing Software Thread-Level Speculation technique, this paper proposes a method to optimize industrial big data algorithms, and makes them run on Apache Spark in parallel. Specifically, the proposed method analyzes how complex inner dependencies affect algorithm's parallelism, then speculatively partitions the algorithm into subtasks to conquer these dependencies, and predicts inputs for the subtasks. After executing the subtasks in parallel, the results are collected and validated, the correct results are kept and committed while the incorrect ones are abandoned. By this way the optimal parallelism for industrial big data algorithms can be achieved. The experiments show that by the proposed method, the particle swarm optimization algorithm can achieve speedup by 150%-230% comparing with the unspeculative one. Therefore, the execution efficiency of low parallelized algorithm on Apache Spark can be markedly enhanced by the proposed optimization method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    0
    Citations
    NaN
    KQI
    []