Google’s new Gemini AI beats GPT-4 in 30 of 32 tests

But will the difference be enough to matter in real life?
Sign up for the Freethink Weekly newsletter!
A collection of our favorite stories straight to your inbox

Tech giant Google has finally unveiled its much-hyped Gemini AI, a series of generative AI models it claims are its “largest and most capable” to date. 

“This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company,” said Google CEO Sundar Pichai. 

Multimodal AI: Generative AIs are algorithms trained to create original content in response to user prompts. OpenAI’s first iteration of ChatGPT, for example, can understand and produce human-like text, while its DALL-E 2 system can generate images based on text prompts. 

While those systems understand and generate just one type of content, a multimodal generative AI can work with several — in September, OpenAI announced a multimodal version of ChatGPT that could understand image, voice, and text inputs.

“Its capabilities are state-of-the-art in nearly every domain.”

Demis Hassabis

The Gemini era: According to Google, multimodal AIs are traditionally created by combining separate, specialized models into one program, but it took a different approach with its Gemini AI, training it to be multimodal from the start.

“This helps Gemini seamlessly understand and reason about all kinds of inputs from the ground up, far better than existing multimodal models — and its capabilities are state-of-the-art in nearly every domain,” wrote Demis Hassabis, CEO and cofounder of Google DeepMind.

In addition to being highly capable, Google says the Gemini AI is also its “most flexible” model. This has allowed the company to create three different sizes of the AI: Ultra, Nano, and Pro. 

  • Gemini Ultra is the most powerful model, designed for complex tasks. According to Google, it’s the first generative AI model to outperform human experts on the MMLU, a benchmark assessing knowledge across 57 subjects. Google is currently soliciting feedback on Ultra from select users, but expects to make it widely available in 2024.
  • Gemini Nano is the least capable model, but it’s small and efficient enough to run locally on smartphones. Google has already made it available on its Pixel 8 Pro — owners of that smartphone can use the AI to summarize audio recordings or generate responses to WhatsApp messages.
  • Gemini Pro, meanwhile, falls between Nano and Ultra in terms of capabilities and size. Google has integrated an English-language version of that model into its ChatGPT-like Bard, which will reportedly get an Ultra upgrade in 2024.

The big picture: Like the rest of the tech industry, Google has been racing to catch up with OpenAI in the generative AI space ever since the release of ChatGPT in 2022, and it’s been hyping the Gemini AI for months as the tech that will put it ahead. 

While Gemini did outperform OpenAI’s GPT-4 on 30 of 32 benchmarks tested (including the MMLU), the difference was often just a percentage point or two — meaning Google may be ahead, but only by a little and only compared to an AI model that’s been out for 9 months already.

“It’s clear that Gemini is a very sophisticated AI system … [but] it’s not obvious to me that Gemini is actually substantially more capable than GPT-4,” Melanie Mitchell, an AI researcher at the Santa Fe Institute in New Mexico, told MIT Technology Review.

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].

Sign up for the Freethink Weekly newsletter!
A collection of our favorite stories straight to your inbox
Related
AI chatbots may ease the world’s loneliness (if they don’t make it worse)
AI chatbots may have certain advantages when roleplaying as our friends. They may also come with downsides that make our loneliness worse.
Will AI supercharge hacking — if it hasn’t already?
The future of hacking is coming at us fast, and it isn’t clear yet whether AI will help attackers and defenders more.
No, LLMs still can’t reason like humans. This simple test reveals why.
Most AI models are incredible at taking tests but easily bamboozled by basic reasoning. “Simple Bench” shows us why.
The future of fertility, from artificial wombs to AI-assisted IVF
A look back at the history of infertility treatments and ahead to the tech that could change everything we thought we knew about reproduction.
“Model collapse” threatens to kill progress on generative AIs
Generative AIs start churning out nonsense when trained on synthetic data — a problem that could put a ceiling on their ability to improve.
Up Next
A black and white photo of the advice columnist known as 'Dear Abby' with generative text collage elements.
Subscribe to Freethink for more great stories