profile picture
Ho Kei (Rex) Cheng

I am a Ph.D. candidate at the University of Illinois Urbana-Champaign, advised by Alexander Schwing. Before that, I was at The Hong Kong University of Science and Technology, advised by Yu-Wing Tai and Chi Keung Tang.

My research focuses on two directions: building long-term memory architectures to model temporal dynamics in videos, and developing generative models using flow matching. I have interned at Adobe Research, Kaiber, Sony AI, and FAIR/Meta MSL PAR.

[GitHub] | [Google Scholar] | [CV]


Selected Research (hover over videos to play)
Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Liliane Momeni, Rishi Hazra, Shuangrui Ding, Sagar Vaze, Francois Porcher, Feng Li, Siyuan Li, Aishwarya Kamath, Ho Kei Cheng, Piotr Dollár, Nikhila Ravi, Kate Saenko, Pengchuan Zhang, Christoph Feichtenhofer.
ICLR 2025
Project page / code / arXiv
A unified model for detection, segmentation, and tracking of objects in images and video using text, exemplar, and visual prompts.
C2OT
Ho Kei Cheng, Alexander Schwing.
ICCV 2025
Project page / code / arXiv
Provides straighter flows through condition-aware coupling of samples from the prior and data distributions, without the test-time degradation induced by naïve optimal transport.
CVPR 2025
Project page / code / arXiv / Space demo / Replicate
Generates high-quality synchronized audio from video or text inputs, with an architecture that enables training on data from multiple sources even when some modalities are missing. Click here to watch a fun video!
CVPR 2024 Highlight
Project page / code / arXiv
Uses an object transformer to combine pixel-level and object-level features for efficient and robust video object segmentation in challenging scenarios. Used by iMotions and Annolid.
ICCV 2023
Project page / code / arXiv
Achieves open-world video segmentation by combining universal image segmentation with temporal propagation. Easy to extend.
Ho Kei Cheng, Alexander Schwing.
ECCV 2022
Project page / code / arXiv
Approaches video object segmentation from a memory perspective with a pipeline that effectively models both short-term and long-term dependencies. Used by supervisely and Track-Anything.
Ho Kei Cheng, Yu-Wing Tai, Chi Keung Tang.
NeurIPS 2021
Project page / code / arXiv
A simple yet effective method to model pixel correspondences between frames. Used by Trioscope and BURST.
Ho Kei Cheng, Yu-Wing Tai, Chi Keung Tang.
CVPR 2021
Project page / code / arXiv
Decouples interactive video segmentation into two components: single-frame interaction and temporal propagation, demonstrating significantly improved performance. Used by Sieve.
CVPR 2020
Project page / code / arXiv / pypi
An iterative refinement network that achieves high-quality 4K+ segmentation using only low-resolution training data (less than 500 pixels per side).


Work Experience
Meta logo
United States, May 2025 - Nov 2025
Contributed to video perception in SAM 3, focusing on efficient memory designs for object tracking.
Sony AI logo
Japan, May 2024 - Nov 2024
Developed MMAudio, a multimodal flow-matching generative model, and trained it from scratch.
Kaiber logo
United States, Nov 2023 - May 2024
Researched controllable video generation and editing pipelines using diffusion models.
Adobe logo
United States, May 2022 - Nov 2022
Proposed DEVA and Cutie for efficient video segmentation; the technology contributed to Adobe Express and After Effects.


Invited Talks
Open Source Tools
Professional Activities
Misc