Online action proposal generation using spatio-temporal attention network

2022 
module to capture the inter-spatial relationship between the features of incoming frames. The windowed spatial network produces more robust clip-level feature representation and efficiently deals with noisy features such as occlusion or background scenes. Second, we introduce a temporal attention module to capture relevant temporal dynamic information mutually to the localized spatial information to model the long inter-frame temporal relationship since most online real life videos are untrimmed in nature. By applying these two attention modules sequentially, the novel proposed spatio-temporal network model is able to generate precise action boundaries at a particular instant of time. In addition, the model generates fewer discriminative temporal action proposals while maintaining a low computational cost and high processing speed suitable for online settings.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []