MIT's AI predicts new strains of HIV, coronavirus

Researchers are using language-recognizing AI to predict virus mutations.

With new coronavirus variants cropping up seemingly by the day, it’s urgent that vaccines and public health efforts be able to stay ahead of the pandemic.

Unfortunately, mutations are random, and it’s usually impossible to predict what their impact will be when they do occur.

The trouble, basically, is that we don’t speak virus.

But what if we could?

Talking Virus

“When you say a sentence, it’s not just a random jumble of words,” says Brian Hie, a doctoral student at MIT. There is structure that corresponds to grammar and other rules, he says. “We kind of reasoned that … biology also has these kinds of patterns.”

Those patterns, he hypothesized, would be found in the chains of amino acids, the building blocks of life, from which all proteins — and cells, tissues, bacteria, and viruses — are made. 

This led Hie, along with MIT’s Bryan Bryson, an assistant professor of bioengineering, and Bonnie Berger, professor of applied mathematics, to try to crack that code the best way they knew how: with AI.

Using a kind of AI called natural language processing (NLP) — the kind used by Siri and Alexa to understand the infinite variations of human speech —  Hie fed it a bunch of viral genetic sequences that spell out the amino acids that are used to make critical proteins. 

Buried in those strings of code, they hoped to find patterns that would allow them to predict what mutations a virus may develop, and rank the mutations by their potential to evolve around the human immune system.

Mutant Viruses

As you’re likely aware by now, viruses mutate, and they do it often.

Most of these mutations make little difference to the virus’s abilities. A few, however, can make all the difference — some specific coronavirus mutations in a bat or pangolin somewhere gifted SARS-CoV-2 the ability to jump into human beings. 

Other coronavirus mutations — like those identified in the U.K. and South Africa — have given rise to strains that may be more contagious or deadly. And some, like South Africa’s and another new strain discovered in the Amazonian metropolis of Manaus, may be able to slip past antibodies against the original virus, like those in recovered patients or people vaccinated against the original strain.

Mutating past our defenses is known as immune escape. These virus mutations are bad news. 

The trouble, basically, is that we don’t speak virus. But what if we could?

But being able to read the virus’s code and predict what the mutations mean could be a game-changer. 

In their study, published in Science, the MIT team trained their NLP on the thousands and thousands of lab-confirmed viral genetic sequences stored in various databases. 

“There are these very large corpuses of viral sequences just available online,” Hie says, covering viruses from HIV to influenza. Per MIT Technology Review, the AI was fed 45,000 influenza sequences, 60,000 for HIV, and around 4,000 for SARS-CoV-2. (That last one is, as you can imagine, growing.)

“The fortunate thing was that these datasets are public, they’re peer-reviewed, and they’re experimentally validated,” Bryson says. “So we took that as our gold standard.” 

Armed with this library of virus mutations, they trained three separate NLPs — one for each virus  — that would then predict mutations that are different enough to escape the immune system but not so different that it breaks the virus.

To validate their results, they went right to the viruses themselves, testing in the lab whether the mutations the NLP predicted would help the virus escape did better than chance at beating antibodies against the virus.

According to MIT Technology Review, their model beat the state of the art when it comes to predicting viral escape in the lab, with two highly impressive predictions for HIV and a strain of coronavirus. 

Reception thus far has been positive, the team says; when he can’t sleep, Bryson scans Twitter at 4 am looking for responses to the paper, and nothing negative’s come across the transom … yet. 

Strains In Silica

Testing these potential escape mutations in the lab always carries some kind of risk, but if the computer model gets good enough, eventually we may not have to.

“If you can do most of that in silica, then maybe that would be helpful,” Hie says.

Eventually, the team hopes that their NLP models will help guide researchers when it comes to not only what to look out for, but which places on a virus have a low chance of escape mutations, making them ideal targets for vaccines. 

For example, by aiming at the stalk of a flu protein, rather than its more changeable head, one could potentially create a universal flu vaccine, one of vaccinology’s Most Wanted.

“That’s where we want to go,” Berger says. “With suggesting experimentation and development of therapeutics.”

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].

What’s next for COVID-19 drugs?
Paxlovid may have underperformed in a new trial, but other promising COVID-19 drugs are being authorized or in the works.
A protein found in human sweat may protect against Lyme disease
Human sweat contains a protein that may protect against Lyme disease, according to a study from MIT and the University of Helsinki.
“Insane” new type of virus-like organisms found in human gut
Stanford scientists have discovered a strange new class of virus-like organisms, called “obelisks,” in the human gut microbiome.
New antiviral shortens COVID-19 by 1.5 days
People taking simnotrelvir, a new antiviral treatment for COVID-19, felt almost immediate symptom relief and got better 1.5 days faster.
World’s first “self-amplifying” vaccine approved in Japan
The approval of the first saRNA vaccine could signal a new era in how we prevent and treat everything from infections to cancer.
Up Next
Johnson and Johnson's coronavirus vaccine
Subscribe to Freethink for more great stories