CosyVoice (Zero-Shot Cloning)
Zero-shot voice cloning mode of CosyVoice model by Alibaba.
About
Fun-CosyVoice 3.0 is the latest open text-to-speech model from Alibaba's FunAudio team, built on a large language model backbone and aimed at zero-shot voice cloning in the wild. It accepts a short reference clip and generates speech that follows the target voice, language, and prosody. The repository ships multiple checkpoints (Fun-CosyVoice3, CosyVoice2, CosyVoice-300M variants), a Hugging Face demo, and vLLM-accelerated serving.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Category
- Voice Cloning & Voice Conversion
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Intermediate (3/5)
- License
- Apache-2.0
- Minimum VRAM
- 8 GB
- Added
- Apr 3, 2026
Related Tools
Community fork of RVC with additional features and optimizations.
Updated zero-shot voice cloning by MyShell with improved quality.
Voice cloning mode of XTTS-v2 for creating custom voice replicas.
Zero-shot voice cloning capabilities of Fish Speech model.
Easy-to-use voice conversion framework based on retrieval for real-time voice cloning.
Text-free one-shot voice conversion model requiring no text transcription.