About Me

I am currently a staff engineer at Kuaishou Technology. I earned my Master degree from Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS in 2023. During my graduate studies, my research focused on efficient video understanding, aiming to design lightweight yet powerful architectures for video representation. After graduating, I joined Kuaishou to focus on pushing the boundaries of large-scale video recommender systems of 400M DAUs. In 2026, I pivoted my focus towards the intersection of language agents and video creation.

Recommender Systems 🚀

ChorusCVR: Chorus Supervision for Entire Space Post-Click Conversion Rate Modeling

Wei Cheng, Yucheng Lu, Boyang Xia et al.
Accepted by WSDM 2026 | [Paper]
Tackles the sample selection bias (SSB) in CVR estimation by introducing a novel Entire-Space Dual Multi-Task Learning framework. It effectively discriminates between factual negative samples (clicked but un-converted) and ambiguous ones (un-clicked), significantly boosting CVR model robustness at Kuaishou’s e-commerce live service.

HarmonRank: Ranking-aligned Multi-objective Ensemble for Live-streaming E-commerce Recommendation

Boyang Xia, et al.
Preprint (2026) | [Paper]
Rethinks multi-objective ranking ensembles by shifting from traditional score fusion to rank consistency. By introducing an end-to-end differentiable ranking technique to directly optimize multi-objective AUC, it achieved a 2.6% purchase gain on Kuaishou’s live-streaming e-commerce platform.

STCRank: Spatio-temporal Collaborative Ranking for Interactive Recommender System at Kuaishou E-shop

Boyang Xia, R. Bao, H. Jiang, J. Wang, W. Ou
Preprint (2026) | [Paper]
Proposes a spatio-temporal collaborative ranking framework tailored for immersive and interactive e-commerce recommendation at Kuaishou, empowering users to actively guide the recommendation loop and drastically improving user engagement and long-term conversion.

Video Understanding and Computer Vision

NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition

Accepted by European Conference on Computer Vision 2022 (ECCV'22, 19.9$\%$ acceptance rate).
A sampler with a $4\times$ faster practical speed than SOTA methods.
[Paper]|[Project]

To accelerate the video recognition architectures, one typically build a lightweight video key frame sampler to firstly sample a few salient frames and then evoke a recognizer on them, to reduce temporal redundancies. However, existing methods neglect the discrimination between non-salient frames and salient ones in training objectives. We introduced a novel multi-granularity supervision scheme to suppress the non-salient frames and achieved SOTA accuracy with very low GFLOPs and wall-clock time ($4\times$ faster than SOTA methods) on 4 video recognition benchmarks.

Temporal Saliency Query Network for Efficient Video Recognition

Accepted by European Conference on Computer Vision 2022 (ECCV'22, 19.9$\%$ acceptance rate).
The first work to model temporal sampling as a query-response task.
[Paper]|[Project]

A human can precisely elect the most informative frames with the aid of prior knowledge about the probable category of the video. Inspired by this intuition, we pioneeringly cast frame sampling as a query-response task to introduce category prior knowledge from both visual and textual modalities in temporal sampling framework. Experimental results show the efficacy of our method on both and practical speed.

Time Series Anomaly Detection with Memory-Enhanced Composite Neural Networks

Technical report.
An effective framework for multivariate unsupervised time series anomaly detection.
[Paper]|[Code]

Low discrimination between normal and abnormal data is an important challenge for reconstruction based unsupervised anomaly detection methods. This is because model learns trivial patterns brought by sensor noise in training data inevitably. We solve this problem by a memory mechanism, where non-trivial normal patterns are stored in a memory matrix and time series representation are cleaned after attentional memory addressing. Extensive experiments on two industrial control systems (ICS) cybersecurity datasets demonstrate the effectiveness of our approach.

Undergraduate Works

Subtle Appearance Anomaly Detection Based on Deep Learning

With the honors of Outstanding Bachelor Thesis Award (Top 2$\%$) in 2020.
A weakly-supervised product counterfeit detection framework with 70$\%+$ annotation savings.

An efficient and effective framework for product counterfeit detection based on learnable informative region crop. It is notesworthy that the framework only use image-level annotations, without region-level annotions.

Competitions

Contest of Automatic Identification of Butterflies In the Wild in 2020

Object detection task, with small-object, longtail and occlusion challenges.
CCF $\times$ BCDI
Rank 3/1004

POI Name Generation Contest in 2021

Text generation with multimodality (image and text) information.
CCF $\times$ Amap. Inc.
Rank 3/1107

Global AI Innovation Contest in 2022

Image-text matching of key attributes of e-commerce.
Chinese Association of Artificial Intelligence $\times$ JD.COM
Rank 7/1300+

Publications

Boyang Xia, et al. NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition (Accepted by ECCV’2022)
Boyang Xia, et al. Temporal Saliency Query Network for Efficient Video Recognition (Accepted by ECCV’2022)
Li, L., Gao, K., Cao, J., Huang, Z., Weng, Y., Mi, X., … & Xia, B. Progressive Domain Expansion Network for Single Domain Generalization. (Accepted by CVPR’2021)
Yang, H., Wu, W., Wang, L., Jin, S., Xia, B., Yao, H., & Huang, H. Temporal Action Proposal Generation with Background Constraint (Accepted by AAAI’2022)
Wang, H., He, D., Wu, W., Xia, B., Yang, M., Li, F., Yu, Y., Ji, Z., Ding, E., Wang, J. CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval (Accepted by ECCV’2022)
Xia B Y, Cao C, Han Y H, et al. Universal photonic three-qubit quantum gates with two degrees of freedom assisted by charged quantum dots inside single-sided optical microcavities. (Published in Laser Physics, 2018 (SCI District 3))