Prediction of natural products classes using machine learning and 13C NMR spectroscopic data.

2020 
Structure elucidation of chemical compounds is a complex and challenging activity that requires expertise and well-suited tools. To assign the molecular structure of a given compound, 13C NMR is one of the most widely used techniques due to its broad range of structural information. Taking into account that molecules found in nature can be grouped into natural product (NP) classes because of structural similarities, we explore the possibility of NP class prediction given 13C NMR data. Employing freely available 13C NMR data of NP, we trained four classifiers for the prediction of eight common NP classes. The best performance was obtained with the XGBoost classifier reaching f1-scores above 0.82. We also performed experiments with different percentages of positive samples and including the glycoside presence. Furthermore, we tested cases outside the data set yielding performances above 80% for most classes. For the chromans cases, we restricted the test examples to the coumarin subclass and the prediction accuracy increased to 100%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    77
    References
    14
    Citations
    NaN
    KQI
    []