Investigating the need for preprocessing of near-infrared spectroscopic data as a function of sample size

2020 
Abstract Preprocessing of near-infrared (NIR) spectra is an essential part of multivariate calibration. It mainly aims to remove artefacts caused during measurement to improve prediction performance or interpretation. However, preprocessing can have undesired side-effects. Additionally, calibration algorithms can learn to deal with artefacts by themselves when enough samples are available. This may influence the effect preprocessing has on prediction performance when the calibration dataset size increases. In this paper we investigate the interaction between the size of the calibration data and preprocessing for NIR calibrations for several datasets. Results show that extending the calibration data with more samples improves prediction performance, regardless of the preprocessing strategy. Although prediction performance almost always benefits from preprocessing, extending the calibration data can reduce the effect of preprocessing on prediction performance. This means the optimal preprocessing strategy may change as a function of the number of samples. It is demonstrated that using a Design of Experiments (DoE) approach to determine the optimal preprocessing strategy leads to equal or better prediction performance for all calibration set sizes compared to the case of not preprocessing at all. Preprocessing is most valuable for small calibration sets, but as the calibration set increases can become obsolete or even harmful. Therefore, we recommend to always evaluate the effect of a preprocessing strategy before making or updating calibration models.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    6
    Citations
    NaN
    KQI
    []