XTTS

Coqui deep learning toolkit for text-to-speech with pretrained models in 1100+ languages.

Open SourceSelf HostedOffline Capable

0.0 (0)

Visit Website View on GitHub Documentation

About

XTTS is the flagship voice cloning model of Coqui TTS, a Python deep learning toolkit for text-to-speech that has been battle-tested in research and production. XTTS v2 clones a voice from a reference clip as short as six seconds, speaks 17 languages with cross-language cloning, and streams audio at 24 kHz with latency under 200 milliseconds, which made it one of the most capable open voice cloning systems of its generation. The wider toolkit spans classic architectures such as Tacotron 2, Glow-TTS, and VITS along with multiple neural vocoders, provides training and fine-tuning recipes for new languages and voices, and through Fairseq integration exposes pretrained models covering more than 1100 languages. The toolkit code is licensed under MPL 2.0, while the XTTS model weights ship under the Coqui Public Model License, which restricts commercial use. Coqui the company shut down in early 2024, but community forks keep the project maintained and its models remain widely deployed.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: Text-to-Speech (TTS)
Price: Free
Platform: Local/Desktop
Difficulty: Intermediate (3/5)
License: MPL-2.0
Added: May 7, 2026

0.0 (0)

Website GitHub

Featured

Bark

Text-to-Speech (TTS)

Transformer-based text-to-audio model by Suno that generates speech, music, and sound effects.

Open SourceSelf HostedOfflineGPU 4GB+

Intermediate

0.0 (0)

Website GitHub

Browse all Text-to-Speech (TTS) tools

Mentioned in

Open-Weight Text to Speech Models in 2026: The XTTS Successors

A working developer's comparison of Kokoro, Zonos, Kyutai TTS, F5-TTS, Piper, Chatterbox and the rest:...

Billy C

XTTS

About

Reviews (0)

Leave a Review

Details

Tags

Related Tools

Kokoro TTS

ChatTTS

CosyVoice

CosyVoice 2

EmotiVoice

Bark

Mentioned in