XTTS-v2 (Voice Cloning)
Voice cloning mode of XTTS-v2 for creating custom voice replicas.
About
XTTS v2, shipped as part of the Coqui TTS library, is a multilingual voice-cloning model that reproduces a target voice from roughly six seconds of reference audio across 17 languages. It supports cross-language cloning, so a clip in one language can drive output in another, and inference runs on consumer GPUs with around 4 GB of VRAM. Distributed under the Coqui Public Model License within the broader MPL-2.0 codebase.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Category
- Voice Cloning & Voice Conversion
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Easy (2/5)
- License
- CPML
- Minimum VRAM
- 4 GB
- Added
- Apr 3, 2026
Related Tools
Zero-shot voice cloning mode of CosyVoice model by Alibaba.
Community fork of RVC with additional features and optimizations.
Updated zero-shot voice cloning by MyShell with improved quality.
Zero-shot voice cloning capabilities of Fish Speech model.
Easy-to-use voice conversion framework based on retrieval for real-time voice cloning.
Text-free one-shot voice conversion model requiring no text transcription.