Balancing data for generalizable machine learning to predict glass-forming ability of ternary alloys

2022 
Abstract Machine Learning has thrived on the emergence of data-driven materials science. However, the materials datasets acquired at existing research efforts have significant imbalance issues. This paper investigated the data imbalance for the glass-forming ability of ternary alloy systems, which consists of abundant, low-fidelity high-throughput data, and sparse, high-fidelity traditional experimental data. We demonstrated a new method to handle the data imbalance and trained artificial neural network (ANN) models on the original vs. balanced datasets. The ANN model trained on the balanced dataset solved the overfitting issue suffered by the model trained on the original dataset. More importantly, the generalizability in predicting the new alloy system was improved in the data-balanced model, evidenced by the leave-one-alloy-system-out validation. Our work highlights the importance of handling data imbalance in material datasets to solve the overfitting issues of machine learning models and further enhance generalizability in predicting the characteristics of the new material systems.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    0
    Citations
    NaN
    KQI
    []