Challenges and solutions to improve the scalability of an operational regional meteorological forecasting model

2011 
This work investigates the parallel scalability of BRAMS, a limited area weather forecasting production code, from O(100) cores to O(1,000) cores on large grids (20 km and 10 km resolution runs over South America). Initial experiments show lack of scalability at modest core count. Execution time profiling and source code examination revealed the causes of the limited scalability: sequential algorithms and extensive memory requirements at scarcely used phases of the computation. As processor count increases, these 'secondary' phases dominate execution time. Algorithm replacement and memory reduction generate a new code version that possesses strong and weak scaling. The new version achieved a speed-up of 6 from 100 to 700 processors on the 20 km resolution grid and a speed-up of 6.9 on the same processor range on the 10 km resolution grid. Results were confirmed at another machine with a distinct architecture. Further experiments show that the scalability of the 20 km resolution case is limited by load unbalancing at the most demanding computational phase.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    9
    Citations
    NaN
    KQI
    []