The potential of random forest and neural networks for biomass and recombinant protein modeling in Escherichia coli fed-batch fermentations.

2015 
Product quality assurance strategies in production of biopharmaceuticals currently undergo a transformation from empirical “quality by testing” to rational, knowledge-based “quality by design” approaches. The major challenges in this context are the fragmentary understanding of bioprocesses and the severely limited real-time access to process variables related to product quality and quantity. Data driven modeling of process variables in combination with model predictive process control concepts represent a potential solution to these problems. The selection of statistical techniques best qualified for bioprocess data analysis and modeling is a key criterion. In this work a series of recombinant Escherichia coli fed-batch production processes with varying cultivation conditions employing a comprehensive on- and offline process monitoring platform was conducted. The applicability of two machine learning methods, random forest and neural networks, for the prediction of cell dry mass and recombinant protein based on online available process parameters and two-dimensional multi-wavelength fluorescence spectroscopy is investigated. Models solely based on routinely measured process variables give a satisfying prediction accuracy of about ± 4% for the cell dry mass, while additional spectroscopic information allows for an estimation of the protein concentration within ± 12%. The results clearly argue for a combined approach: neural networks as modeling technique and random forest as variable selection tool.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    22
    Citations
    NaN
    KQI
    []