Tools/Music & Audio Generation/DiffRhythm

DiffRhythm

Full-length song generation model using diffusion with lyrics and style conditioning.

Open SourceSelf HostedOffline CapableGPU Required (12GB+ VRAM)

0.0 (0)

About

DiffRhythm generates complete songs, vocals and instrumentals together, from lyrics plus either a text description of the musical style or a reference audio clip. Developed by the ASLP lab at Northwestern Polytechnical University, its authors describe it as the first open-sourced diffusion-based music generation model capable of full-length songs. The system builds on latent diffusion, and the v1.2 release comes in two variants: a base model producing tracks around 1 minute 35 seconds and a full model reaching about 4 minutes 45 seconds. Generation needs a GPU with at least 8 GB of VRAM when chunked decoding is enabled, more without it. Code and weights are released under the Apache 2.0 license, which permits modification and commercial use with attribution. Researchers studying music generation and developers or hobbyists producing AI-assisted songs are the typical users, and the lyric conditioning makes it notable among open music models that usually output instrumentals only.