Data Science for Geoscience: Recent Progress and Future Trends from the Perspective of a Data Life Cycle

2021 
Data science receives increasing attention in a variety of geoscience disciplines and applications. Many successful data-driven geoscience discoveries have been reported recently, and the number of geoinformatics and data science sessions have begun to increase in many geoscience conferences. Across academia, industry, and governmental sectors, there is a strong interest to know more about the current progress as well as the potential of data science for geoscience. To address that need, this article provides a review from the perspective of a data life cycle. The key steps in the data life cycle includes concept, collection, preprocessing, analysis, archive, distribution, discovery, and repurpose. Those subjects are intuitive and easy to follow even for geoscientists with very limited experience of cyberinfrastructure, statistics, and machine learning. The review includes two key parts. The first is about the fundamental concepts and theoretical foundation of data science, and the second is the summary of highlights and sharable experience from existing publications centered on each step in the data life cycle. At the end, a vision about the future trends of data science applications in geoscience is discussed, including topics on open science, smart data, and science of team science. We hope this review will be useful to data science practitioners in the geoscience community, and will lead to more discussions on the best practices and future trends of data science for geoscience.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    112
    References
    0
    Citations
    NaN
    KQI
    []