Online action proposal generation using spatio-temporal attention network
2022
module to capture the inter-spatial relationship between the features of incoming frames. The windowed spatial network produces more robust clip-level feature representation and efficiently deals with noisy features such as occlusion or background scenes. Second, we introduce a temporal attention module to capture relevant temporal dynamic information mutually to the localized spatial information to model the long inter-frame temporal relationship since most online real life videos are untrimmed in nature. By applying these two attention modules sequentially, the novel proposed spatio-temporal network model is able to generate precise action boundaries at a particular instant of time. In addition, the model generates fewer discriminative temporal action proposals while maintaining a low computational cost and high processing speed suitable for online settings.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI