Assessing Differences in Large Spatio-temporal Climate Datasets with a New Python package

2020 
Output data from modern Earth system model simulations are consuming increasingly massive amounts of storage resources, and storing these climate model data is not economically sustainable. Previous works have motivated lossy compression as a potential solution, which achieves greater compression ratios than lossless compression. This further reduction comes at the cost of a loss of information, and therefore, care must be taken to avoid introducing artifacts in the data that could affect scientific conclusions. In this paper we introduce a Python package designed to aid in the analysis of differences in large spatio-temporal datasets, such as those produced by global climate models. While the new package is agnostic to the source of the differences, our motivation is to enable climate scientists to more easily assess the effects of lossy data compression by visualizing and computing derived spatial-temporal quantities that compare lossily compressed datasets to the original dataset. Because Python is quickly becoming the tool of choice for scientific data analysis in the geoscience community, this new package makes use of the Python software stack in Pangeo (an active NSF-funded community platform for Big Data geoscience). Interoperability with other Pangeo software tools means that the new package easily integrates into climate scientists’ post-processing and analysis workflows, which we hope will facilitate the adoption of lossy compression into the climate modeling community.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    1
    Citations
    NaN
    KQI
    []