AudioLDM 2

Latent diffusion model for text-to-audio, music, and speech generation.

Open SourceSelf HostedOffline CapableGPU Required (8GB+ VRAM)
0.0 (0)

About

AudioLDM 2 is a unified latent diffusion model for text-to-audio, text-to-music, and text-to-speech generation. Uses a shared representation space for different audio types. Generates high-quality audio from text descriptions. Requires GPU with 8+ GB VRAM. By CVSSP, University of Surrey.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Price
Free
Platform
Local/Desktop
Difficulty
Intermediate (3/5)
Minimum VRAM
8 GB
Added
Apr 3, 2026

Related Tools

Fast music generation model producing full songs with lyrics in seconds.

Open SourceSelf HostedOfflineGPU 8GB+
Intermediate
0.0 (0)

Open-source toolkit for audio, music, and speech generation research.

Open SourceSelf HostedOfflineGPU 8GB+
Advanced
0.0 (0)

Original latent diffusion model for text-to-audio generation.

Open SourceSelf HostedOfflineGPU 8GB+
Intermediate
0.0 (0)
Featured

State-of-the-art music source separation model by Meta for splitting tracks.

Open SourceSelf HostedOffline
Easy
0.0 (0)

High-fidelity neural audio codec by Meta for audio compression and tokenization.

Open SourceSelf HostedOffline
Intermediate
0.0 (0)

Updated music generation model with improved quality and longer generation.

Open SourceSelf HostedOfflineGPU 8GB+
Intermediate
0.0 (0)
Browse all Music & Audio Generation tools