Active learning of potential-energy surfaces of weakly-bound complexes with regression-tree ensembles.

2021 
Several pool-based active learning algorithms were employed to model potential energy surfaces (PESs) with a minimum number of electronic structure calculations. Among these algorithms, the class of uncertainty-based algorithms are popular. Their key principle is to query molecular structures corresponding to high uncertainties in their predictions. We empirically show that this strategy is not optimal for nonuniform data distributions as it collects many structures from sparsely sampled regions, which are less important to applications of the PES. We exploit a simple stochastic algorithm to correct for this behavior and implement it using regression trees, which have relatively small computational costs. We show that this algorithm requires around half the data to converge to the same accuracy than the uncertainty-based algorithm query-by-committee. Simulations on a 6D PES of pyrrole(H$_2$O) show that $< 15\,000$ configurations are enough to build a PES with a generalization error of $16$ cm$^{-1}$, whereas the final model with around $50\,000$ configurations has a generalization error of $11$ cm$^{-1}$.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []