Holistic Prediction of pKa in Diverse Solvents Based on Machine Learning Approach.

2020 
Although numerous theoretic approaches have been developed for predicting aqueous p K a , fast and accurate prediction of non-aqueous p K a s has remained a major challenge. On the basis of iBonD experimental p K a database curated across 39 solvents, a holistic p K a prediction model was established by using machine learning approach. Structural and physical organic parameters combined descriptors (SPOC) were introduced to represent the electronic and structural features of molecules. With SPOC and ionic status labelling (ISL), the holistic models trained with neural network or XGBoost algorithm showed the best prediction performance with MAE value as low as 0.87 p K a unit. The capability of prediction in diverse solvents allows for a comprehensive mapping of all the possible p K a correlations between different solvents, verifying the existence of transfer learning features . The holistic model was validated by prediction of aqueous p K a and micro-p K a of pharmaceutical molecules and p K a s of organocatalysts in DMSO and MeCN with high accuracy. An on-line prediction platform ( http://pka.luoszgroup.com ) was constructed based on the current model, which could provide p K a prediction beyond the reach otherwise for different types of X-H acidity in the most commonly used solvents.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    64
    References
    22
    Citations
    NaN
    KQI
    []