Google’s AI music generator is like ChatGPT for audio

It can write 5-minute songs based on short text prompts.

Google has unveiled an advanced AI music generator that can turn a snippet of text into a song — but legal concerns might prevent the tech giant from ever sharing it with the public.

The AI revolution: ChatGPT, DALL-E 2, and other advanced AIs capable of generating impressive text or images in response to user prompts exploded in popularity in 2022, but they weren’t the first generative AIs, nor the only examples of what the neural networks can do.

Several companies have also trained AIs to generate music in response to text, audio, or image prompts — OpenAI, the research firm behind ChatGPT and DALL-E 2, even released an AI music generator called “Jukebox” back in 2020.

These systems haven’t been as enthusiastically embraced as their text- and image-generating counterparts, though, mainly because their outputs aren’t as impressive — most are low-fidelity, simplistic, and lacking in traditional song structures, such as repeating choruses.

What’s new? Music-making AIs are getting better, though, and perhaps the most impressive example of the technology is MusicLM, an AI music generator unveiled by Google in January 2023.

The system can generate clips up to 5 minutes long based on text descriptions, and while the music isn’t going to win any Grammys, the audio does sound more like something a human might record than the clips generated by other AIs.

How it works: Google trained MusicLM on more than 280,000 hours of music sourced from MuLan, a model trained to link music to descriptions written in natural language.

They then created MusicCaps, a publicly accessible dataset of more than 5,500 music clips to use to evaluate the AI music generator. Expert musicians wrote captions for each of these clips, as well as lists of aspects to describe them, such as their genre or mood.

During the evaluation stage, Google pitted MusicLM against two other text-to-music AIs — Mubert and Riffusion — using several quantitative metrics for assessing a clip’s audio quality and adherence to a text description. 

They also presented human evaluators with MusicCaps’ descriptions and two audio clips — these might be two clips produced by AIs or one AI-generated clip and the music upon which the MusicCaps description was based. The evaluators then chose which of the clips they thought best matched the description. 

According to a paper Google shared on the preprint server arXiv, MusicLM outperformed the other AIs across the board. 

“We strongly emphasize the need for more future work in tackling these risks associated to music generation.”

Agostinelli et al.

Looking ahead: Google’s AI music generator may be able to produce audio that sounds closer to human-written music, but it still can’t replicate traditional song structures, and the vocals it creates are particularly poor quality, with unintelligible lyrics.

Google says future work on the system could focus on those issues, improving the overall quality of the audio, and addressing the problem that’s preventing it from releasing the MusicLM to the public: about 1% of its output can be approximately matched to audio in its training data.

“We acknowledge the risk of potential misappropriation of creative content associated to the use case … We strongly emphasize the need for more future work in tackling these risks associated to music generation,” the researchers wrote.

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].

Will generative AI change everything for filmmaking?
We asked an experimental filmmaker, an MIT economist, and an AI startup executive how generative AIs could impact the world of filmmaking.
Why ChatGPT feels more “intelligent” than Google Search
There will be a moment, coming soon, when AI makes the leap from tool to entity.
New AI generates CRISPR proteins unlike any seen in nature
An AI that generates CRISPR proteins is opening the door to gene editors with capabilities beyond what we’ve found in nature.
How Brilliant Labs CEO is creating a “symbiosis of humanity and artificial intelligence”
CEO Bobak Tavangar discusses the philosophy behind Brilliant’s latest device, Frame, and his vision for the future of AI.
“Bionic eye” discovers Plato’s final resting place
Plato’s final resting place has been identified thanks to a “bionic eye” built to read the Herculaneum scrolls.
Up Next
scientific papers
Subscribe to Freethink for more great stories