Tools/Text-to-Speech (TTS)/VoiceCraft

VoiceCraft

Zero-shot speech editing and TTS using neural codec language models.

Open SourceSelf HostedOffline CapableGPU Required (8GB+ VRAM)

0.0 (0)

Visit Website View on GitHub

About

VoiceCraft from UT Austin is a token-infilling neural codec language model that performs both speech editing and zero-shot text-to-speech on in-the-wild audio such as audiobooks, podcasts, and videos. It can edit existing recordings or clone an unseen voice from a few seconds of reference, and ships code and demos. Inference uses a GPU with around 8 GB of VRAM. Distributed as open-source research.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: Text-to-Speech (TTS)
Price: Free
Platform: Local/Desktop
Difficulty: Advanced (4/5)
Minimum VRAM: 8 GB
Added: Apr 3, 2026

Tags

tts speech-editing voice-cloning neural-codec research

Related Tools

Featured

Kokoro TTS

Text-to-Speech (TTS)

Lightweight and expressive TTS model with 82M parameters for fast local inference.

Open SourceSelf HostedOffline

Easy

4.0 (1)

ChatTTS

Text-to-Speech (TTS)

Conversational TTS model optimized for dialogue and chat applications.

Open SourceSelf HostedOfflineGPU 4GB+

Intermediate

0.0 (0)

CosyVoice

Text-to-Speech (TTS)

Multilingual large voice generation model with full-stack inference, training, and deployment.

Open SourceSelf HostedOfflineGPU

Intermediate

0.0 (0)

CosyVoice 2

Text-to-Speech (TTS)

Large-scale multilingual TTS model by Alibaba with zero-shot voice cloning.

Open SourceSelf HostedOfflineGPU 8GB+

Advanced

0.0 (0)

EmotiVoice

Text-to-Speech (TTS)

Emotion-controllable TTS engine by NetEase with 2000+ voices.

Open SourceSelf HostedOfflineGPU 4GB+

Intermediate

0.0 (0)

Featured

Bark

Text-to-Speech (TTS)

Transformer-based text-to-audio model by Suno that generates speech, music, and sound effects.

Open SourceSelf HostedOfflineGPU 4GB+

Intermediate

0.0 (0)

Browse all Text-to-Speech (TTS) tools