ST-VLAD: Video Face Recognition Based on Aggregated Local Spatial-Temporal Descriptors

Yu Wang,Huang Yongping,Xuanjing Shen

ST-VLAD: Video Face Recognition Based on Aggregated Local Spatial-Temporal Descriptors

2021

How to integrate the temporal and spatial continuity information, when designing the video texture description operator, is crucial to realize video face recognition and facilitate video analysis and understanding, however, it has still yet to be properly addressed. In this paper, a novel video face recognition algorithm is proposed based on an aggregated local spatial-temporal descriptor (ST-VLAD), followed by a novel Fisher Criterion-based weight-learning method, which portrays the local information of the video more accurately, therefore largely improving the representation ability of description vectors. The proposed descriptor was tested on two representative databases, Honda/UCSD and YouTube Face database, achieving accuracies of 89.7% and 87.3%, respectively. The proposed method greatly outperformed the other existing state-of-art methods, suggesting a potential broad utility in the field of video face recognition.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations