Temporal Saliency Query Network for Efficient Video Recognition

European Conference on Computer Vision (ECCV) 2022

Boyang Xia1,2*,  Zhihao Wang1,2*,  Wenhao Wu3,4$\dagger$Haoran Wang3Jungong Han5
1Institute of Computing Technology, Chinese Academy of Sciences
2University of Chinese Academy of Sciences
3The University of Sydney   4Baidu Inc.   5Aberystwyth University
Code | Paper

A human can precisely elect the most informative frames with the aid of prior knowledge about the probable category of the video. Inspired by this intuition, we pioneeringly cast frame sampling as a query-response task to introduce category prior knowledge from both visual and textual modalities in temporal sampling framework. Experimental results show the efficacy of our method on both and practical speed.