Robust QSAR model development in high-throughput catalyst discovery based on genetic parameter optimisation

2009 
Abstract High-throughput strategies are gaining importance in catalyst formulation and discovery. The increased experimental capacity produces valuable data from which quantitative structure–activity relationship (QSAR) models can be developed to link catalyst composition and structure with the final performance. Various QSAR modelling algorithms are available, however, they are generally configurable and their performance is highly dependent on the correct choice of parameters. With the proliferation and increasing sophistication of integrated data-mining tools, there is a need for systematic, robust, and generic parameter optimisation methods. This paper investigates a genetic algorithm (GA) for parameter optimisation of several QSAR methods for classification and regression: including feed-forward neural networks, decision tree generators, and support vector machines, with cross-validation providing the performance estimate. The methods were applied to four datasets, including three datasets from recent reports of high-throughput studies and one from our own laboratory. The results confirm that parameter optimisation is a critical step in QSAR modelling, and demonstrate the effectiveness of the GA approach. The best results were shared among the modelling methods, emphasising the importance of considering more than one type of model.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    9
    Citations
    NaN
    KQI
    []