OuteTTS
Pure language modeling approach to TTS without traditional audio codecs.
About
OuteTTS is a text-to-speech model that treats speech as a pure language modeling problem, generating audio as next-token prediction without a separate audio codec or vocoder pipeline. It supports voice cloning, stays lightweight, and ships through llama.cpp bindings as well as Transformers, with Python and npm packages. Released as an open model by OuteAI.
Reviews (0)
Leave a Review
No reviews yet. Be the first to review!
Details
- Category
- Text-to-Speech (TTS)
- Price
- Free
- Platform
- Local/Desktop
- Difficulty
- Intermediate (3/5)
- Minimum VRAM
- 4 GB
- Added
- Apr 3, 2026
Related Tools
Lightweight and expressive TTS model with 82M parameters for fast local inference.
Open-source TTS model by Resemble AI with emotion and accent control.
Expressive zero-shot TTS model by Resemble AI with emotion and accent control.
Singing voice conversion model based on VITS and SoftVC for voice-to-voice transfer.
Zero-shot TTS model with high naturalness and speaker similarity.
Transformer-based text-to-audio model by Suno that generates speech, music, and sound effects.