A new method for ligand-based virtual screening using linear algebra

2020 
Ligand-based virtual screening of large molecular databases can help reduce costs with experiments by filtering and ranking promising compounds in an initial stage of the drug developing process. However, some ligand-based methods can be ineffective when presented with a high-dimensional number of attributes extracted from an extensive dataset of compounds. Herein, we propose a drug-mining algorithm that can screen ligands and repurpose known drugs from any dataset for any target. The Milk-Way algorithm combines mathematical and regression methods to select promising compounds from a high-dimensional and imbalanced dataset without massive computational power. The significant advantages of Milk-Way algorithm are non-recursive, and the utilization of more features than individuals in the same model. To validate the algorithm, we used literature data of known ligands and compared Milk-Way performance with the methods of Support Vector Machine (SVM) and Random Forest (RF). The chosen datasets of HIV-1 reverse transcriptase receptors showed that our algorithm had better AUC (Area under curve of Receiver Operating Characteristics Curve) than SVM and RF. We also worked with 17 targets from a different database to evaluate the new algorithm, which were consistent with previous, reaching the AUC=1.00. The feature selection done through the Milk-Way algorithm has been improved the values of AUC of itself but, also, the AUC of SVM, and Logistic Regression (LR). Moreover, a prospective screening targeting cyclin-dependent kinase type two (CDK-2) was carried out. The combined use of the algorithm metrics and molecular docking (DOCK6.8) suggested five promising drugs to be repositioned. Three were already mentioned as possible inhibitors of related diseases in the literature. In order to complementary my thesis with a structure-based virtual screening technique, I explored the vector space of protein targets of approved drugs. This strategy results in a suggestion of treatment to COVID-19, the tetrachlorodecaoxide. The product of this dissertation is the Milk-Way algorithm, and two others sub-products: a feature selection procedure and, a mathematical model of protein targets of approved drugs. These products resulted in two deposit patents, one paper published, and a draft of another.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []