Temporal Saliency Query Network for Efficient Video Recognition
European Conference on Computer Vision (ECCV) 2022
Boyang Xia1,2*, Zhihao Wang1,2*, Wenhao Wu3,4$\dagger$, Haoran Wang3, Jungong Han5,
2University of Chinese Academy of Sciences
3The University of Sydney 4Baidu Inc. 5Aberystwyth University
A human can precisely elect the most informative frames with the aid of prior knowledge about the probable category of the video. Inspired by this intuition, we pioneeringly cast frame sampling as a query-response task to introduce category prior knowledge from both visual and textual modalities in temporal sampling framework. Experimental results show the efficacy of our method on both and practical speed.