Moûsai

Text-to-music generation model using cascaded latent diffusion.

Open SourceSelf HostedOffline CapableGPU Required (8GB+ VRAM)
0.0 (0)

About

Mousai is a research text-to-music model that uses cascaded latent diffusion to generate long-form stereo music at 48 kHz from text descriptions. It is built on the audio-diffusion-pytorch library, a customizable waveform-based diffusion toolkit covering unconditional and text-conditional generation, diffusion autoencoding, upsampling, and vocoding. The library is provided for PyTorch; pretrained weights follow the paper's configuration. Open-source research release.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Price
Free
Platform
Local/Desktop
Difficulty
Advanced (4/5)
Minimum VRAM
8 GB
Added
Apr 3, 2026

Related Tools

Fast music generation model producing full songs with lyrics in seconds.

Open SourceSelf HostedOfflineGPU 8GB+
Intermediate
0.0 (0)

Open-source toolkit for audio, music, and speech generation research.

Open SourceSelf HostedOfflineGPU 8GB+
Advanced
0.0 (0)

Original latent diffusion model for text-to-audio generation.

Open SourceSelf HostedOfflineGPU 8GB+
Intermediate
0.0 (0)
Featured

State-of-the-art music source separation model by Meta for splitting tracks.

Open SourceSelf HostedOffline
Easy
0.0 (0)

High-fidelity neural audio codec by Meta for audio compression and tokenization.

Open SourceSelf HostedOffline
Intermediate
0.0 (0)

Updated music generation model with improved quality and longer generation.

Open SourceSelf HostedOfflineGPU 8GB+
Intermediate
0.0 (0)
Browse all Music & Audio Generation tools