Rex Cheng

Ho Kei (Rex) Cheng

I am a Ph.D. candidate at the University of Illinois Urbana-Champaign, advised by Alexander Schwing. Before that, I was at The Hong Kong University of Science and Technology, advised by Yu-Wing Tai and Chi Keung Tang.

My research focuses on two directions: building long-term memory architectures to model temporal dynamics in videos, and developing generative models using flow matching. I have interned at Adobe Research, Kaiber, Sony AI, and Meta FAIR/MSL.

[GitHub] | [Google Scholar] | [CV]

Selected Research (hover over videos to play)

SAM 3: Segment Anything with Concepts

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Liliane Momeni, Rishi Hazra, Shuangrui Ding, Sagar Vaze, Francois Porcher, Feng Li, Siyuan Li, Aishwarya Kamath, Ho Kei Cheng, Piotr Dollár, Nikhila Ravi, Kate Saenko, Pengchuan Zhang, Christoph Feichtenhofer.

ICLR 2025

Project page / code / arXiv

A unified model for detection, segmentation, and tracking of objects in images and video using text, exemplar, and visual prompts.

The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation

Ho Kei Cheng, Alexander Schwing.

ICCV 2025

Project page / code / arXiv

Provides straighter flows through condition-aware coupling of samples from the prior and data distributions, without the test-time degradation induced by naïve optimal transport.

MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Ho Kei Cheng, Masato Ishii, Akio Hayakawa, Takashi Shibuya, Alexander Schwing, Yuki Mitsufuji.

CVPR 2025

Project page / code / arXiv / Space demo / Replicate

Generates high-quality synchronized audio from video or text inputs, with an architecture that enables training on data from multiple sources even when some modalities are missing. Click here to watch a fun video!

Putting the Object Back into Video Object Segmentation

Ho Kei Cheng, Seoung Wug Oh, Brian Price, Joon-Young Lee, Alexander Schwing.

CVPR 2024 Highlight

Project page / code / arXiv

Uses an object transformer to combine pixel-level and object-level features for efficient and robust video object segmentation in challenging scenarios. Used by iMotions and Annolid.

Tracking Anything with Decoupled Video Segmentation

Ho Kei Cheng, Seoung Wug Oh, Brian Price, Alexander Schwing, Joon-Young Lee.

ICCV 2023

Project page / code / arXiv

Achieves open-world video segmentation by combining universal image segmentation with temporal propagation. Easy to extend.

XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

Ho Kei Cheng, Alexander Schwing.

ECCV 2022

Project page / code / arXiv

Approaches video object segmentation from a memory perspective with a pipeline that effectively models both short-term and long-term dependencies. Used by supervisely and Track-Anything.

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

Ho Kei Cheng, Yu-Wing Tai, Chi Keung Tang.

NeurIPS 2021

Project page / code / arXiv

A simple yet effective method to model pixel correspondences between frames. Used by Trioscope and BURST.

Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Ho Kei Cheng, Yu-Wing Tai, Chi Keung Tang.

CVPR 2021

Project page / code / arXiv

Decouples interactive video segmentation into two components: single-frame interaction and temporal propagation, demonstrating significantly improved performance. Used by Sieve.

CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement

Ho Kei Cheng*, Jihoon Chung*, Yu-Wing Tai, Chi Keung Tang.

CVPR 2020

Project page / code / arXiv / pypi

An iterative refinement network that achieves high-quality 4K+ segmentation using only low-resolution training data (less than 500 pixels per side).

Work Experience

Research Intern, Meta FAIR/MSL

United States, May 2025 - Nov 2025

Developed object multiplexing for efficient video inference in SAM 3 and SAM 3.1.

Research Intern, Sony AI

Japan, May 2024 - Nov 2024

Developed MMAudio, a multimodal flow-matching generative model, and trained it from scratch.

Research Intern, Kaiber AI

United States, Nov 2023 - May 2024

Researched controllable video generation and editing pipelines using diffusion models.

Research Intern, Adobe Research

United States, May 2022 - Nov 2022

Developed DEVA and Cutie for efficient video segmentation; the technology contributed to Adobe Express and After Effects.

Invited Talks

Object-Level Reasoning in Video Object Segmentation and Its Multimodal Applications, Twelve Labs 2024
Segmenting Videos in the Open World, IBM Zurich, Accelerated Discovery 2023
Large-Scale Decoupled Video Segmentation, Apple 2023

Open Source Tools

av-benchmark: Evaluation suite for text-to-audio and video-to-audio generative models.
nitrous-ema: Post-hoc EMA implementation for PyTorch with minimal overhead.
vos-benchmark: Fast evaluation library for video object segmentation.
shared-memory-tensor-dataset: Simple demo for shared-memory data loading with DDP processes.

Professional Activities

Co-organized the 1st Workshop on Generative AI for Audio-Visual Content Creation (Gen4AVC) at ICCV 2025.
Conference Reviewer: CVPR, ICCV, ECCV, NeurIPS, ICLR, ICML (Outstanding Reviewer '22), AAAI.
Journal Reviewer: TMLR, IEEE TPAMI, IEEE TIP, IEEE TCSVT, Pattern Recognition.

Misc

I was a proud member of the HKUST Robotics Team. A short clip.
I am generally interested in artificial intelligence. I believe in AGIs and has high hope for their potential to transform human civilization for the better.
"Man is condemned to be free. Condemned, because he did not create himself, in other respect is free; because, once thrown into the world, he is responsible for everything he does."
Look at this cat in HKUST. Another picture. Or this cat.
"Ho Kei" (with the space) is my first name and "Cheng" is my last name. "Rex" is the commonly used "english name" that is not part of my legal name. This is common in Hong Kong.