profile picture
Ho Kei (Rex) Cheng

I am a Ph.D. candidate at the University of Illinois Urbana-Champaign, advised by Alexander Schwing. Before that, I was at The Hong Kong University of Science and Technology, advised by Yu-Wing Tai and Chi Keung Tang.

I work on visual understanding, with a focus on videos. I have interned at Adobe Research, Kaiber, Sony AI, and FAIR/Meta MSL PAR.

[GitHub] | [Google Scholar] | [CV]


Research (hover over videos to play)
Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Liliane Momeni, Rishi Hazra, Shuangrui Ding, Sagar Vaze, Francois Porcher, Feng Li, Siyuan Li, Aishwarya Kamath, Ho Kei Cheng, Piotr Dollár, Nikhila Ravi, Kate Saenko, Pengchuan Zhang, Christoph Feichtenhofer.
arXiv 2025
Project page / code / arXiv
A unified model for detection, segmentation, and tracking of objects in images and video using text, exemplar, and visual prompts.
C2OT
Ho Kei Cheng, Alexander Schwing.
ICCV 2025
Project page / code / arXiv
Provides straighter flows through condition-aware coupling of samples from the prior and data distributions, without the test-time degradation induced by naïve optimal transport.
CVPR 2025
Project page / code / arXiv / Space demo / Replicate
Generates high-quality synchronized audio from video or text inputs, with an architecture that enables training on data from multiple sources even when some modalities are missing. Click here to watch a fun video!
CVPR 2024 Highlight
Project page / code / arXiv
Uses an object transformer to combine pixel-level and object-level features for efficient and robust video object segmentation in challenging scenarios. Used by iMotions and Annolid.
ICCV 2023
Project page / code / arXiv
Achieves open-world video segmentation by combining universal image segmentation with temporal propagation. Easy to extend.
Ho Kei Cheng, Alexander Schwing.
ECCV 2022
Project page / code / arXiv
Approaches video object segmentation from a memory perspective with a pipeline that effectively models both short-term and long-term dependencies. Used by supervisely and Track-Anything.
Ho Kei Cheng, Yu-Wing Tai, Chi Keung Tang.
NeurIPS 2021
Project page / code / arXiv
A simple yet effective method to model pixel correspondences between frames. Used by Trioscope and BURST.
Ho Kei Cheng, Yu-Wing Tai, Chi Keung Tang.
CVPR 2021
Project page / code / arXiv
Decouples interactive video segmentation into two components: single-frame interaction and temporal propagation, demonstrating significantly improved performance. Used by Sieve.
CVPR 2020
Project page / code / arXiv / pypi
An iterative refinement network that achieves high-quality 4K+ segmentation using only low-resolution training data (less than 500 pixels per side).


Invited Talks
Tools
Professional Activities
Misc