Balancing data for generalizable machine learning to predict glass-forming ability of ternary alloys

Yi Yao,Timothy Sullivan,Feng Yan,Jiaqi Gong,Lin Li

Balancing data for generalizable machine learning to predict glass-forming ability of ternary alloys

2022

Abstract Machine Learning has thrived on the emergence of data-driven materials science. However, the materials datasets acquired at existing research efforts have significant imbalance issues. This paper investigated the data imbalance for the glass-forming ability of ternary alloy systems, which consists of abundant, low-fidelity high-throughput data, and sparse, high-fidelity traditional experimental data. We demonstrated a new method to handle the data imbalance and trained artificial neural network (ANN) models on the original vs. balanced datasets. The ANN model trained on the balanced dataset solved the overfitting issue suffered by the model trained on the original dataset. More importantly, the generalizability in predicting the new alloy system was improved in the data-balanced model, evidenced by the leave-one-alloy-system-out validation. Our work highlights the importance of handling data imbalance in material datasets to solve the overfitting issues of machine learning models and further enhance generalizability in predicting the characteristics of the new material systems.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations