Deep Imputation on Large-Scale Drug Discovery Data

Benedict Irwin,Thomas Whitehead,Scott Rowland,Samar Mahmoud,Gareth Conduit,Matthew D. Segall

Deep Imputation on Large-Scale Drug Discovery Data

2021

More accurate predictions of the biological properties of chemical compounds would guide the selection and design of new compounds in drug discovery and help to address the enormous cost and low success-rate of pharmaceutical RD i) target activity data compiled from a range of drug discovery projects, ii) a high value and heterogeneous dataset covering complex absorption, distribution, metabolism and elimination properties and, iii) high throughput screening data, testing the algorithm’s limits on early-stage noisy and very sparse data. Achieving median coefficients of determination, R2, of 0.69, 0.36 and 0.43 respectively across these applications, the deep learning imputation method offers an unambiguous improvement over random forest QSAR methods, which achieve median R2 values of 0.28, 0.19 and 0.23 respectively. We also demonstrate that robust estimates of the uncertainties in the predicted values correlate strongly with the accuracies in prediction, enabling greater confidence in decision-making based on the imputed values.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations