OpenAI’s GPT-4 outperforms doctors in another new study

It may "know" more about treating eye problems than your own GP.
Sign up for the Freethink Weekly newsletter!
A collection of our favorite stories straight to your inbox

OpenAI’s most powerful AI model outperformed junior doctors in deciding how to treat patients with eye problems and came close to scoring as high as expert ophthalmologists — at least on this test.

The challenge: When doctors are in med school, they rotate across clinical areas, spending time with specialists in surgery, psychiatry, ophthalmology, and more to ensure they’ll have a basic knowledge of all the subjects by the time they get their medical license.

If they become a general practitioner (GP), though, they may rarely use the info they learned in some of those specialties and in treating less common conditions.

GPT-4 significantly outperformed the junior doctors, scoring 69% compared to their 43%.

The idea: Researchers at the University of Cambridge were curious to see whether large language models (LLMs) — AIs that can understand and generate conversational text — could help GPs treat patients with eye problems, something they might not be handling on a day-to-day basis.

For a study published in PLOS Digital Health, they presented GPT-4 — the LLM powering OpenAI’s ChatGPT Plus — with 87 scenarios of patients with a range of eye problems and asked it to choose the best diagnosis or treatment from four options.

They also gave the test to expert ophthalmologists, trainees working to become ophthalmologists, and unspecialized junior doctors, who have about as much knowledge of eye problems as general practitioners.

“The most important thing is to empower patients to decide whether they want computer systems to be involved or not.”

Arun Thirunavukarasu

GPT-4 significantly outperformed the junior doctors, scoring 69% on the test compared to their median score of 43%. It also scored higher than the trainees, who had a median score of 59%, and was pretty close to the median score of the expert ophthalmologists: 76%.

“What this work shows is that the knowledge and reasoning ability of these large language models in an eye health context is now almost indistinguishable from experts,” lead author Arun Thirunavukarasu told the Financial Times.

Looking ahead: The Cambridge team doesn’t think LLMs will replace doctors, but they do envision the systems being integrated into clinical workflows — a GP who is having trouble getting in touch with a specialist for advice on how to treat something they haven’t seen in a while (or ever) could query an AI, for example.

“The most important thing is to empower patients to decide whether they want computer systems to be involved or not,” said Thirunavukarasu. “That will be an individual decision for each patient to make.”

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].

Sign up for the Freethink Weekly newsletter!
A collection of our favorite stories straight to your inbox
Related
The future of fertility, from artificial wombs to AI-assisted IVF
A look back at the history of infertility treatments and ahead to the tech that could change everything we thought we knew about reproduction.
“Model collapse” threatens to kill progress on generative AIs
Generative AIs start churning out nonsense when trained on synthetic data — a problem that could put a ceiling on their ability to improve.
The AI chip startup that could take down Nvidia
A new kind of AI chip developed by a team of Harvard dropouts could shift the ground beneath our massive AI economy.
The future of data centers — on land, at sea, and in space
As our digital world grows, demand for data centers is also increasing. To meet that demand sustainably, developers are getting creative.
LLMs are a dead end to AGI, says François Chollet
AI researcher François Chollet thought we needed a better way to measure progress on the path to AGI — so he made one.
Up Next
A view of an orange and blue jet in flight, with desert terrain visible in the background.
Subscribe to Freethink for more great stories