Modeling Parallelization Overheads for Predicting Performance

2016 
Legacy codes primarily exist on a single core processor. With the proliferation of multicore processors, end users often want to migrate to new platforms to improve performance or reduce execution time of the application. Migration from a single core processor to multicore is an expensive proposition. Thus, end users often want to get some idea about possible performance benefit prior to actual migration. Parallelizing a given application often leads to overheads due to the very constructs that enable parallelization. These overheads reduce performance of the application. In this paper, we analyze the overheads caused by OpenMP parallelization constructs. We further provide guidelines to programmers on how to reduce these overheads and maximize the performance benefits of parallelization. We start our analysis by using motivational examples, create a model and then validate our model with benchmark codes. Our experiments show that the following factors affect overheads: 1) type and scope of arrays, 2) array access w.r.t. the overall data flow, 3) number of iterations, and 4) the chunk sizes during execution. Based on our experiments we propose a mathematical model for predicting the number of cycles for these overheads. We use this model to predict overheads of four benchmark codes. Our results show that the error between the number of cycles predicted and observed is on an average 8.22%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    2
    Citations
    NaN
    KQI
    []