Taming Multimodal Joint Training for High-Quality
Video-to-Audio Synthesis
arXiv 2024
Ho Kei Cheng
1
Masato Ishii
2
Akio Hayakawa
2
Takashi Shibuya
2
Alexander Schwing
1
Yuki Mitsufuji
2,3
1
University of Illinois Urbana-Champaign
2
Sony AI
3
Sony Group Corporation
[Paper (soon)]
[Code]
[Huggingface Demo]
[Colab Demo]
TL;DR
MMAudio generates synchronized audio given video and/or text inputs.
Demo
<More results>