Tools/Text-to-Speech (TTS)/Matcha-TTS

Matcha-TTS

Fast TTS with conditional flow matching for efficient speech synthesis.

Open SourceSelf HostedOffline CapableGPU Required (4GB+ VRAM)

0.0 (0)

Visit Website View on GitHub

About

Developed at KTH Royal Institute of Technology and published at ICASSP 2024, Matcha-TTS is a non-autoregressive text-to-speech model that uses optimal-transport conditional flow matching, a technique similar to rectified flows, to cut the number of ODE solver steps needed for synthesis. The result is fast, probabilistic speech generation with a compact memory footprint and natural-sounding output, with controls for speaking rate, sampling temperature, and solver step count to trade speed against quality. The package installs with pip and offers a command line tool, a Gradio web demo, and a Jupyter notebook, plus full training scripts for custom datasets and ONNX export for deployment. Released as open source under the MIT license, it serves speech researchers as a flow-matching baseline and developers who need lightweight local TTS, and its approach has influenced later speech systems that adopt flow matching.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: Text-to-Speech (TTS)
Price: Free
Platform: Local/Desktop
Difficulty: Advanced (4/5)
License: MIT
Minimum VRAM: 4 GB
Added: Apr 3, 2026

Tags

tts flow-matching fast non-autoregressive research

Related Tools

Featured

Kokoro TTS

Text-to-Speech (TTS)

Lightweight and expressive TTS model with 82M parameters for fast local inference.

Open SourceSelf HostedOffline

Easy

4.0 (1)

ChatTTS

Text-to-Speech (TTS)

Conversational TTS model optimized for dialogue and chat applications.

Open SourceSelf HostedOfflineGPU 4GB+

Intermediate

0.0 (0)

CosyVoice

Text-to-Speech (TTS)

Multilingual large voice generation model with full-stack inference, training, and deployment.

Open SourceSelf HostedOfflineGPU

Intermediate

0.0 (0)

CosyVoice 2

Text-to-Speech (TTS)

Large-scale multilingual TTS model by Alibaba with zero-shot voice cloning.

Open SourceSelf HostedOfflineGPU 8GB+

Advanced

0.0 (0)

EmotiVoice

Text-to-Speech (TTS)

Emotion-controllable TTS engine by NetEase with 2000+ voices.

Open SourceSelf HostedOfflineGPU 4GB+

Intermediate

0.0 (0)

Featured

Bark

Text-to-Speech (TTS)

Transformer-based text-to-audio model by Suno that generates speech, music, and sound effects.

Open SourceSelf HostedOfflineGPU 4GB+

Intermediate

0.0 (0)

Browse all Text-to-Speech (TTS) tools