MuseTalk
Real-time high-quality lip-sync model for audio-driven talking face generation.
About
MuseTalk is an audio-driven lip-sync model from Tencent that generates 30+ fps high-resolution talking-face video from a reference image or video and an input audio track. It operates in the latent space of an FT-MSE-VAE and uses a spatio-temporal sampling approach with perceptual, GAN, and sync losses to balance visual quality and lip accuracy. Version 1.5 ships inference, training code, and weights for self-hosted use.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Category
- AI Animation & Motion
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Intermediate (3/5)
- Minimum VRAM
- 6 GB
- Added
- Apr 3, 2026
Related Tools
Realistic human pose and facial expression transfer from video.
Audio-driven talking head animation from a single image.
Audio-driven portrait animation with lifelike expressions and head movements.
Hierarchical audio-driven visual synthesis for portrait animation.
Accurately lip-sync videos to any audio using a pre-trained model.
Efficient portrait animation framework for stitching and retargeting.