Riffusion | |
---|---|
![]() | |
Developer(s) |
|
Initial release | December 15, 2022 |
Repository | github |
Written in | Python |
Type | Text-to-image model |
License | MIT License |
Website | riffusion.com |
Riffusion is a neural network, designed by Seth Forsgren and Hayk Martiros, that generates music using images of sound rather than audio. [1]
The resulting music has been described as "de otro mundo" (otherworldly), [2] although unlikely to replace man-made music. [2] The model was made available on December 15, 2022, with the code also freely available on GitHub. [3]
The first version of Riffusion was created as a fine-tuning of Stable Diffusion, an existing open-source model for generating images from text prompts, on spectrograms, [1] resulting in a model which used text prompts to generate image files which could then be put through an inverse Fourier transform and converted into audio files. [3] While these files were only several seconds long, the model could also use latent space between outputs to interpolate different files together [1] [4] (using the img2img capabilities of SD). [5] It was one of many models derived from Stable Diffusion. [5]
In December 2022, Mubert [6] similarly used Stable Diffusion to turn descriptive text into music loops. In January 2023, Google published a paper on their own text-to-music generator called MusicLM. [7] [8]
Forsgren and Martiros formed a startup, also called Riffusion, and raised $4 million in venture capital funding in October 2023. [9] [10]