They adapt the AI Stable Diffussion model to generate music from text

The developers of the artificial intelligence (AI) model Stable Difussion have adapted this technology to be able to create spectrograms capable of becoming audio or music clips from text.

Stable Difussion is a text-to-image machine learning model developed by Stability AI,which is used to generate high-quality digital images from text.

Two developers named Seth Forsgren and Hayk Martiros have created a project called Rifussion through which they adapt this solution to music. With her spectrograms can be generated that can be translated,in turn, into audio clips.

As the creators of this project explain on their website, an audio spectrogram or sonogram is a visual representation based on sets of text prompts entered by the user.

These sonograms have two axes: X, which represents time, and Y, which represents frequency. The color of each pixel of each audio spectrogram, instead is its amplitude. It is precisely this fact that takes into account Torchaudio, which takes the image generated by Stable Diffusion and converts it to audio.

From Rifussion they announce that it is not only possible to generate music from images and text, but it is also allowed combine, experiment and merge styles.

The developers have pointed out that, in case of having a sufficiently powerful GPU, sonograms can be created with a size of generated images of 512 x 512 pixels and of five seconds long. However, infinite variations can be introduced based on the same original image.

Rifussion currently includes a clip generator, as well as instructions and technical details in order to use this technology in his web page. Also, their code is available in their repository on GitHub.

They adapt the AI Stable Diffussion model to generate music from text

ByEditor

By Editor

Related Post

NASA’s new AI chief warns of the “errors of omission” of that technology

Discovery of dinosaur fossils with feathers and scales

OPPO presents the new Reno 12 smartphones, with a curved screen design on all four sides and MediaTek chips

Leave a Reply Cancel reply

You missed

The urban music of Rosalía, C. Tangana, Bad Gyal, La Zowi, Quevedo and Yung Beef, honored in a SGAE exhibition

The dog known for the Doge meme has died

In exile, director Mohammad Rasoulof arrived in Cannes

An explanation for the eastern neighbor’s surprise announcement in the exam: “Russia always responds in some way”

The Observatorial