Google’s AI music generator is like ChatGPT for audio

It can write 5-minute songs based on short text prompts.

Google has unveiled an advanced AI music generator that can turn a snippet of text into a song — but legal concerns might prevent the tech giant from ever sharing it with the public.

The AI revolution: ChatGPT, DALL-E 2, and other advanced AIs capable of generating impressive text or images in response to user prompts exploded in popularity in 2022, but they weren’t the first generative AIs, nor the only examples of what the neural networks can do.

Several companies have also trained AIs to generate music in response to text, audio, or image prompts — OpenAI, the research firm behind ChatGPT and DALL-E 2, even released an AI music generator called “Jukebox” back in 2020.

These systems haven’t been as enthusiastically embraced as their text- and image-generating counterparts, though, mainly because their outputs aren’t as impressive — most are low-fidelity, simplistic, and lacking in traditional song structures, such as repeating choruses.

What’s new? Music-making AIs are getting better, though, and perhaps the most impressive example of the technology is MusicLM, an AI music generator unveiled by Google in January 2023.

The system can generate clips up to 5 minutes long based on text descriptions, and while the music isn’t going to win any Grammys, the audio does sound more like something a human might record than the clips generated by other AIs.

How it works: Google trained MusicLM on more than 280,000 hours of music sourced from MuLan, a model trained to link music to descriptions written in natural language.

They then created MusicCaps, a publicly accessible dataset of more than 5,500 music clips to use to evaluate the AI music generator. Expert musicians wrote captions for each of these clips, as well as lists of aspects to describe them, such as their genre or mood.

During the evaluation stage, Google pitted MusicLM against two other text-to-music AIs — Mubert and Riffusion — using several quantitative metrics for assessing a clip’s audio quality and adherence to a text description. 

They also presented human evaluators with MusicCaps’ descriptions and two audio clips — these might be two clips produced by AIs or one AI-generated clip and the music upon which the MusicCaps description was based. The evaluators then chose which of the clips they thought best matched the description. 

According to a paper Google shared on the preprint server arXiv, MusicLM outperformed the other AIs across the board. 

“We strongly emphasize the need for more future work in tackling these risks associated to music generation.”

Agostinelli et al.

Looking ahead: Google’s AI music generator may be able to produce audio that sounds closer to human-written music, but it still can’t replicate traditional song structures, and the vocals it creates are particularly poor quality, with unintelligible lyrics.

Google says future work on the system could focus on those issues, improving the overall quality of the audio, and addressing the problem that’s preventing it from releasing the MusicLM to the public: about 1% of its output can be approximately matched to audio in its training data.

“We acknowledge the risk of potential misappropriation of creative content associated to the use case … We strongly emphasize the need for more future work in tackling these risks associated to music generation,” the researchers wrote.

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].

Related
See how Moderna is using OpenAI tech across its workforce
A partnership between Moderna and OpenAI provides a real-world example of what can happen when a company leans into generative AI.
Shining a light on oil fields to make them more sustainable
Sensors and analytics give oil well operators real-time alerts when things go wrong, so they can respond before they become disasters.
OpenAI’s GPT-4 outperforms doctors in another new study
OpenAI’s most powerful AI model, GPT-4, outperformed junior doctors in deciding how to treat patients with eye problems.
Watch the first AI vs. human dogfight using military jets
An AI fighter pilot faced off against a human pilot in a “dogfight” using actual planes — a huge milestone in military automation.
AI can help predict whether a patient will respond to specific tuberculosis treatments
Instead of a one-size-fits-all treatment approach, AI could help personalize treatments for each patient to provide the best outcomes.
Up Next
scientific papers
Subscribe to Freethink for more great stories