Tools/Text-to-Speech (TTS)

Text-to-Speech (TTS) AI Tools

Open-source text-to-speech engines and models for generating natural-sounding speech from text input.

Synthesizing speech from text was a solved, boring problem for years, handled by rule based engines behind screen readers and phone trees. It stopped being boring once LLM agents needed a voice, and the people reaching for these projects are building accessibility layers, audiobook and dubbing pipelines, game dialogue, and the output leg of voice agents. Three camps split the field in 2026. The small neural and classical end, eSpeak NG, Festival, VITS, and Matcha-TTS, gives deterministic output in tens of megabytes on a CPU, at the cost of flat prosody and no cloning. The speech LLM camp, Orpheus TTS, Llasa, Spark-TTS, and Sesame CSM, treats audio as tokens on a Llama or Qwen backbone and gets emotion and zero shot cloning from seconds of reference audio, but inherits sampling nondeterminism and a GPU floor. Between them sit flow matching models such as F5-TTS, MaskGCT, and StyleTTS 2, which drop autoregression for predictable latency. A reasonable starting point is Kokoro TTS, 82M parameters, Apache licensed, and fast enough on CPU for most batch narration. Chatterbox TTS covers expressive cloning, and Kyutai TTS handles the case where speech must start before the LLM finishes its reply. The trap is that weights and code carry different licenses: XTTS ships permissive code with model weights under Coqui's non commercial CPML, and several cloning models carry similar restrictions the top level LICENSE file never mentions. edge-tts is a separate surprise, being a client for a Microsoft endpoint rather than a local model.

Text-to-Speech (TTS) AI Tools

Kokoro TTS

GPT-SoVITS

Bark

CosyVoice 2

EmotiVoice

eSpeak NG

F5-TTS

Festival

Fish Speech

Dia TTS

IndexTTS

So-VITS-SVC

Chatterbox TTS

VoiceCraft

FishAudio S1-mini

Chatterbox TTS (Resemble)

Matcha-TTS

MeloTTS

MetaVoice

Moshi

Mozilla TTS

OpenVoice

OuteTTS

Sherpa-ONNX TTS

StyleTTS 2

VALL-E X

WhisperSpeech

XTTS

Mimic 3

Tortoise TTS

Parler-TTS

VibeVoice

NeuTTS Air

edge-tts

FireRedTTS 2

Higgs Audio

KittenTTS

Kyutai TTS

Llasa

MaskGCT

MegaTTS3

Orpheus TTS

RealtimeTTS

Sesame CSM

Spark-TTS

VITS

Zonos

ChatTTS

CosyVoice

Filters