Tools/Text-to-Speech (TTS)/GPT-SoVITS

Featured Tool

GPT-SoVITS

Few-shot voice cloning and TTS combining GPT and SoVITS architectures.

Open SourceSelf HostedOffline CapableGPU Required (6GB+ VRAM)

0.0 (0)

Visit Website View on GitHub

About

GPT-SoVITS pairs a GPT-style semantic model with SoVITS voice synthesis for few-shot voice cloning and text-to-speech. Given only about five seconds of reference audio it performs zero-shot cloning, and with roughly one minute of data it can be fine-tuned for markedly better speaker similarity; it also does voice conversion between speakers. The system covers Chinese, English, Japanese, Korean, and Cantonese and can speak languages different from the reference voice's training data. A WebUI bundles the whole workflow, including tools for vocal separation, audio slicing, ASR-based labeling, and dataset preparation, so non-experts can train a voice end to end. Pretrained on around 5,000 hours of speech, it runs fastest on NVIDIA GPUs, with real-time factors near 0.03 on an RTX 4060 Ti, but also supports CPU and Apple Silicon. Released under the MIT license, it is a common choice for character voices, dubbing, and hobbyist voice cloning.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: Text-to-Speech (TTS)
Price: Free
Platform: Local/Desktop
Difficulty: Intermediate (3/5)
Minimum VRAM: 6 GB
Added: Apr 3, 2026

Tags

tts voice-cloning few-shot gpt sovits webui

Related Tools

Featured

Kokoro TTS

Text-to-Speech (TTS)

Lightweight and expressive TTS model with 82M parameters for fast local inference.

Open SourceSelf HostedOffline

Easy

4.0 (1)

ChatTTS

Text-to-Speech (TTS)

Conversational TTS model optimized for dialogue and chat applications.

Open SourceSelf HostedOfflineGPU 4GB+

Intermediate

0.0 (0)

CosyVoice

Text-to-Speech (TTS)

Multilingual large voice generation model with full-stack inference, training, and deployment.

Open SourceSelf HostedOfflineGPU

Intermediate

0.0 (0)

CosyVoice 2

Text-to-Speech (TTS)

Large-scale multilingual TTS model by Alibaba with zero-shot voice cloning.

Open SourceSelf HostedOfflineGPU 8GB+

Advanced

0.0 (0)

EmotiVoice

Text-to-Speech (TTS)

Emotion-controllable TTS engine by NetEase with 2000+ voices.

Open SourceSelf HostedOfflineGPU 4GB+

Intermediate

0.0 (0)

Featured

Bark

Text-to-Speech (TTS)

Transformer-based text-to-audio model by Suno that generates speech, music, and sound effects.

Open SourceSelf HostedOfflineGPU 4GB+

Intermediate

0.0 (0)

Browse all Text-to-Speech (TTS) tools