Audio Diffusion Models
7Text-to-music, text-to-audio, and sound effect generation
Generate music, speech, and audio effects from text descriptions using diffusion-based models. From AudioLDM for general audio generation to MusicGen for text-to-music to Stable Audio for professional-quality output.
Audio Generation Models
Generate diverse audio content from text descriptions.
AudioLDM / AudioLDM2
Latent diffusion models for text-to-audio with CLAP conditioning.
Bark
Transformer-based model generating speech, music, and sound effects.
MusicGen
Meta text-to-music with melody conditioning and multi-track support.
MusicLDM
Latent diffusion specifically designed for music generation.
Riffusion
Real-time music generation through spectrogram diffusion.
Stable Audio
Stability AI high-quality audio and music generation.
AudioGen
Auto-regressive audio generation from text descriptions.
MAGNeT
Masked Generative Non-autoregressive audio Transformer.
JEN-1
Omnidirectional diffusion model for high-fidelity music generation.
Make-An-Audio
Multi-modal audio generation with temporal and spectral conditions.
Noise2Music
Cascaded diffusion model for music from text descriptions.
Audio generation with AiModelBuilder
using AiDotNet;
// Train an audio generation model with AiModelBuilder
var result = await new AiModelBuilder<float, float[], float>()
.ConfigureModel(new MusicGen<float>(duration: 30))
.ConfigureOptimizer(new AdamOptimizer<float>())
.ConfigurePreprocessing()
.BuildAsync(audioData, labels);
// Generate audio from prompt
var audio = result.Predict(promptEmbedding); Start building with Audio Diffusion Models
All 7 implementations are included free under Apache 2.0.