Experimental Errors in QSAR Modeling Sets: What We Can Do and What We Cannot Do

2017 
Numerous chemical data sets have become available for quantitative structure–activity relationship (QSAR) modeling studies. However, the quality of different data sources may be different based on the nature of experimental protocols. Therefore, potential experimental errors in the modeling sets may lead to the development of poor QSAR models and further affect the predictions of new compounds. In this study, we explored the relationship between the ratio of questionable data in the modeling sets, which was obtained by simulating experimental errors, and the QSAR modeling performance. To this end, we used eight data sets (four continuous endpoints and four categorical endpoints) that have been extensively curated both in-house and by our collaborators to create over 1800 various QSAR models. Each data set was duplicated to create several new modeling sets with different ratios of simulated experimental errors (i.e., randomizing the activities of part of the compounds) in the modeling process. A fivefold c...
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    31
    Citations
    NaN
    KQI
    []