MIT's AI predicts new strains of HIV, coronavirus

Researchers are using language-recognizing AI to predict virus mutations.

January 31, 2021

With new coronavirus variants cropping up seemingly by the day, it’s urgent that vaccines and public health efforts be able to stay ahead of the pandemic.

Unfortunately, mutations are random, and it’s usually impossible to predict what their impact will be when they do occur.

The trouble, basically, is that we don’t speak virus.

But what if we could?

Talking Virus

“When you say a sentence, it’s not just a random jumble of words,” says Brian Hie, a doctoral student at MIT. There is structure that corresponds to grammar and other rules, he says. “We kind of reasoned that … biology also has these kinds of patterns.”

Those patterns, he hypothesized, would be found in the chains of amino acids, the building blocks of life, from which all proteins — and cells, tissues, bacteria, and viruses — are made.

This led Hie, along with MIT’s Bryan Bryson, an assistant professor of bioengineering, and Bonnie Berger, professor of applied mathematics, to try to crack that code the best way they knew how: with AI.

Using a kind of AI called natural language processing (NLP) — the kind used by Siri and Alexa to understand the infinite variations of human speech — Hie fed it a bunch of viral genetic sequences that spell out the amino acids that are used to make critical proteins.

Buried in those strings of code, they hoped to find patterns that would allow them to predict what mutations a virus may develop, and rank the mutations by their potential to evolve around the human immune system.

Mutant Viruses

As you’re likely aware by now, viruses mutate, and they do it often.

Most of these mutations make little difference to the virus’s abilities. A few, however, can make all the difference — some specific coronavirus mutations in a bat or pangolin somewhere gifted SARS-CoV-2 the ability to jump into human beings.

Other coronavirus mutations — like those identified in the U.K. and South Africa — have given rise to strains that may be more contagious or deadly. And some, like South Africa’s and another new strain discovered in the Amazonian metropolis of Manaus, may be able to slip past antibodies against the original virus, like those in recovered patients or people vaccinated against the original strain.

Mutating past our defenses is known as immune escape. These virus mutations are bad news.

The trouble, basically, is that we don’t speak virus. But what if we could?

But being able to read the virus’s code and predict what the mutations mean could be a game-changer.

In their study, published in Science, the MIT team trained their NLP on the thousands and thousands of lab-confirmed viral genetic sequences stored in various databases.

“There are these very large corpuses of viral sequences just available online,” Hie says, covering viruses from HIV to influenza. Per MIT Technology Review, the AI was fed 45,000 influenza sequences, 60,000 for HIV, and around 4,000 for SARS-CoV-2. (That last one is, as you can imagine, growing.)

“The fortunate thing was that these datasets are public, they’re peer-reviewed, and they’re experimentally validated,” Bryson says. “So we took that as our gold standard.”

Armed with this library of virus mutations, they trained three separate NLPs — one for each virus — that would then predict mutations that are different enough to escape the immune system but not so different that it breaks the virus.

To validate their results, they went right to the viruses themselves, testing in the lab whether the mutations the NLP predicted would help the virus escape did better than chance at beating antibodies against the virus.

According to MIT Technology Review, their model beat the state of the art when it comes to predicting viral escape in the lab, with two highly impressive predictions for HIV and a strain of coronavirus.

Reception thus far has been positive, the team says; when he can’t sleep, Bryson scans Twitter at 4 am looking for responses to the paper, and nothing negative’s come across the transom … yet.

Strains In Silica

Testing these potential escape mutations in the lab always carries some kind of risk, but if the computer model gets good enough, eventually we may not have to.

“If you can do most of that in silica, then maybe that would be helpful,” Hie says.

Eventually, the team hopes that their NLP models will help guide researchers when it comes to not only what to look out for, but which places on a virus have a low chance of escape mutations, making them ideal targets for vaccines.

For example, by aiming at the stalk of a flu protein, rather than its more changeable head, one could potentially create a universal flu vaccine, one of vaccinology’s Most Wanted.

“That’s where we want to go,” Berger says. “With suggesting experimentation and development of therapeutics.”

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].