Abstract: Translating spoken speech in videos from one language to another is known as audio-visual translation (AVT). This paper describes the implementation of an automated AVT and lip-synced ...
Comparison results between StableAvatar and state-of-the-art (SOTA) audio-driven avatar video generation models highlight the superior performance of StableAvatar in delivering infinite-length, ...