New “AI doctor” predicts risk of death with 85% accuracy

Some of its predictions were better than those made by a team of doctors.
Sign up for the Freethink Weekly newsletter!
A collection of our favorite stories straight to your inbox

An “AI doctor” was able to make accurate predictions about patients based on medical notes doctors had written in their charts — suggesting a new way for technology to help guide healthcare.

“These results demonstrate that large language models make the development of ‘smart hospitals’ not only a possibility, but a reality,” said neurosugeon Eric K. Oermann, the study’s senior author.

AI doctor: By training AI models on troves of medical data, researchers around the world have created AI systems capable of predicting patients’ disease risks. A UK team’s AI can look at retinal scans to predict a patient’s risk of cardiovascular disease, for example, while an AI developed at Mass General can predict the risk of melanoma recurrence just by looking at pictures of the initial skin cancer.

“One thing that’s common in medicine everywhere, is physicians write notes about what they’ve seen in clinic, what they’ve discussed with patients.”

Eric K. Oermann

In a new study, published in Nature, researchers at NYU set out to see whether they could train an AI to make predictions about patients based on medical notes that doctors and nurses jot down when treating patients.

“One thing that’s common in medicine everywhere, is physicians write notes about what they’ve seen in clinic, what they’ve discussed with patients,” Oermann told AFP. “So our basic insight was, can we start with medical notes as our source of data, and then build predictive models on top of it?”

How it works: While AIs have proven capable of making predictions based on very structured data, like scans and test results, medical notes can have far more variance — two doctors treating the same patient might use different language or abbreviations, or choose to focus on different things. [These notes are all in electronic health records (EHRs), so the AIs did not have to decipher doctors’ handwriting, at least.]

To decode patterns in these notes, the NYU team built a “large language model” (LLM) called “NYUTron” — the same type of AI that powers OpenAI’s popular ChatGPT and Google’s Bard.

“Large language models make the development of ‘smart hospitals’ not only a possibility, but a reality.”

Eric K. Oermann

To train NYUTron, researchers fed it millions of medical notes written by doctors in more than 380,000 patients’ EHRs. These included progress reports, discharge instructions, observations on lab results, and more, with the final dataset totalling about 4.1 billion words.

They then fine-tuned the AI to make five predictions about a patient based on their medical notes:

  • Length of stay in the hospital
  • Risk of being readmitted within 30 days after discharge
  • Risk of dying in the hospital before discharge
  • Risk of developing a new, related health issue
  • Risk of having an insurance claim denied

The results: After training, NYUTron was tested against traditional formulas based on standardized data — its ability to predict readmissions, for example, was compared to that of the LACE index, which looks at factors such as the length of a patient’s current stay and how many times they’ve been hospitalized in the past six months.

NYUTron outperformed the standard models on all five counts, correctly identifying 85% of patients who would die in the hospital and 80% of those who were readmitted, compared to 78% and 75% for the traditional models, respectively. It also correctly estimated 79% of patients’ stay lengths, compared to 68% for the standard model.

“The most senior physician, who’s actually a very famous physician, he had superhuman performance, better than the model.”

Eric K. Oermann

NYUTron also beat out a group of six physicians who were tasked with predicting the readmittance likelihood for 20 patients based on discharge notes in their EHRs — the doctors’ median accuracy at the task was 62.8%, while the AI’s was 77.8%.

However, the AI wasn’t the best at predicting the risk of readmittance — a human doctor took the top spot.

“The most senior physician, who’s actually a very famous physician, he had superhuman performance, better than the model,” said Oermann. But the fact the model can do better than average, even if it’s not better than the very best, is still significant.

“The sweet spot for technology and medicine isn’t that it’s going to always deliver necessarily superhuman results, but it’s going to really bring up that baseline,” Oermann added.

Looking ahead: NYUTron has already been integrated with EHRs at NYU-affiliated hospitals throughout New York, but the researchers note the need for randomized clinical trials that compare interventions based on the AI’s predictions and traditional methods — those will confirm whether or not the system can actually improve patient outcomes.

They also warn in their paper that doctors shouldn’t over-rely on the system, noting that more research is needed to identify any unexpected potential failure points or sources of biases.

Even after that research is conducted, they say the AI should be viewed as a tool for doctors — not a replacement for them. Clinicians are still ultimately the source of the observations and judgments that make it into the notes to begin with.

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].

Sign up for the Freethink Weekly newsletter!
A collection of our favorite stories straight to your inbox
Related
AI chatbots may ease the world’s loneliness (if they don’t make it worse)
AI chatbots may have certain advantages when roleplaying as our friends. They may also come with downsides that make our loneliness worse.
Will AI supercharge hacking — if it hasn’t already?
The future of hacking is coming at us fast, and it isn’t clear yet whether AI will help attackers and defenders more.
No, LLMs still can’t reason like humans. This simple test reveals why.
Most AI models are incredible at taking tests but easily bamboozled by basic reasoning. “Simple Bench” shows us why.
The future of fertility, from artificial wombs to AI-assisted IVF
A look back at the history of infertility treatments and ahead to the tech that could change everything we thought we knew about reproduction.
“Model collapse” threatens to kill progress on generative AIs
Generative AIs start churning out nonsense when trained on synthetic data — a problem that could put a ceiling on their ability to improve.
Up Next
an illustration of a person looking at a baby growing in an artificial womb
Subscribe to Freethink for more great stories