Reward-based Monte Carlo-Bayesian reinforcement learning for cyber preventive maintenance

2018 
Abstract This article considers a preventive maintenance problem related to cyber security in universities. A Bayesian Reinforcement Learning (BRL) problem is formulated using limited data from scan results and intrusion detection system warnings. The median estimated learning time (MELT) measure is introduced to evaluate the speed at which a control system effectively eliminates parametric uncertainty and probability is concentrated on a single scenario. It is demonstrated that the Monte Carlo BRL with enhancements including Latin hypercube sampling (LHS) to generate scenarios, identical systems multi-task learning, and reward-based learning achieves shorter MELT values, i.e., “faster” learning, and improved objective values compared with alternatives in a numerical study. Rigorous results establish the optimality of the derived control strategies and the fact that optimal learning is possible under steady state assumptions. Also, the real-world case study of policies for patching Linux critical server cyber vulnerabilities generates insights including the potential to reduce expenditure per host by mandating compensating controls for critical vulnerabilities.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    31
    References
    9
    Citations
    NaN
    KQI
    []