I am a Ph.D. candidate at the University of Illinois Urbana-Champaign, advised by Alexander Schwing.
Before that, I was at The Hong Kong University of Science and Technology, advised by Yu-Wing Tai and Chi Keung Tang.
I work on visual understanding, with a focus on videos.
I have interned at Adobe Research, Kaiber, and Sony AI.
I am interning at Meta FAIR Perception.
Provides straighter flows through condition-aware coupling of samples from the prior and data distributions, without the test-time degradation induced by naïve optimal transport.
Generates high-quality synchronized audio from video or text inputs, with an architecture that enables training on data from multiple sources even when some modalities are missing.
Click here to watch a fun video!
Uses an object transformer to combine pixel-level and object-level features for efficient and robust video object segmentation in challenging scenarios.
Used by iMotions and Annolid.
Approaches video object segmentation from a memory perspective with a pipeline that effectively models both short-term and long-term dependencies.
Used by supervisely and Track-Anything.
Decouples interactive video segmentation into two components: single-frame interaction and temporal propagation, demonstrating significantly improved performance.
Used by Sieve.
I am generally interested in artificial intelligence. I believe in AGIs and has high hope for their potential to transform human civilization for the better.
"Man is condemned to be free. Condemned, because he did not create himself, in other respect is free; because, once thrown into the world, he is responsible for everything he does."
"Ho Kei" (with the space) is my first name and "Cheng" is my last name. "Rex" is the commonly used "english name" that is not part of my legal name. This is common in Hong Kong.