An Autoencoder with a Memory Module for Video Anomaly Detection

2021 
With the rise of deep convolutional neural networks (CNNs), considerable attention has been paid to video anomaly detection (VAD). Autoencoders are a popular type of framework for VAD, and many existing VAD methods are based on it. However, these methods take an assumption of closed-world VAD, i.e., do not comprehensively consider the diversity of normal patterns. Besides, a competent CNN allows the autoencoder to reconstruct or predict abnormal video frames proficiently, resulting in missing anomalies. To mitigate these drawbacks, we propose an Autoencoder with a Memory Module (AMM) to realize video anomaly detection by predicting video frames. AMM consists of three modules: an encoder, a decoder, and a memory module. First, the consecutive frames are fed into the encoder to yield latent spatial features. Then, the features are utilized to retrieve corresponding memory items in the memory module to generate memory mapping features. Finally, the memory mapping features are adopted in the decoder for predicting the next frame. To match the queries against memory items accurately, we propose a memory triplet loss, which takes into account both size and angle discrepancies between the queries and memory items. At the training stage, AMM utilizes the memory triplet loss, a prediction loss, and multi-scale structure similarity measure. Moreover, the modes of retrieving and updating memory items are ameliorated by a scaled dot product model, which can alleviate vanishing gradient problems to a certain extent. Extensive experiments are conducted on three benchmark public datasets, and the results demonstrate the superior performance of AMM.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    0
    Citations
    NaN
    KQI
    []