Prediction of natural products classes using machine learning and 13C NMR spectroscopic data.
2020
Structure elucidation of chemical compounds is a complex and challenging activity that requires expertise and well-suited tools. To assign the molecular structure of a given compound, 13C NMR is one of the most widely used techniques due to its broad range of structural information. Taking into account that molecules found in nature can be grouped into natural product (NP) classes because of structural similarities, we explore the possibility of NP class prediction given 13C NMR data. Employing freely available 13C NMR data of NP, we trained four classifiers for the prediction of eight common NP classes. The best performance was obtained with the XGBoost classifier reaching f1-scores above 0.82. We also performed experiments with different percentages of positive samples and including the glycoside presence. Furthermore, we tested cases outside the data set yielding performances above 80% for most classes. For the chromans cases, we restricted the test examples to the coumarin subclass and the prediction accuracy increased to 100%.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
77
References
14
Citations
NaN
KQI