Meta can (kinda) guess what you’ve heard via your brain waves

A new AI can produce the correct word being heard based on just brain measurements.

September 7, 2022

There are a host of bad things that can rob someone of their ability to speak — but for some, brain-computer interfaces may be the key to restoring communication, Jean-Rémi King, a research scientist at Meta, told TIME.

“By putting an electrode on the motor areas of a patient’s brain, we can decode activity and help the patient communicate with the rest of the world,” King said.

Already, a brain implant has restored a paralysis patient’s ability to communicate. Rather than needing to point at individual letters or words, the neuroimplant translates his thoughts directly into words.

Phiip O’Keefe, an Aussie with ALS, has a brain-computer interface chip that allows him to translate his thoughts into texts, opening up an entire world of electronic communication — including Twitter. Perhaps most impressively, a patient whose ALS progressed to complete locked-in syndrome also received an implant that allowed communication.

Researchers at Meta are building AI models for decoding speech in the brain.

“But it’s obviously extremely invasive to put an electrode inside someone’s brain,” King said.

(In O’Keefe’s case, it’s worth noting the implant went in through his jugular, so he did not need open brain surgery, although a significant surgery nonetheless.)

“So we wanted to try using noninvasive recordings of brain activity. And the goal was to build an AI system that can decode brain responses to spoken stories.”

King and his colleagues at the Facebook Artificial Intelligence Research (FAIR) Lab have begun to do just that, creating a deep learning AI capable of decoding speech from brain waves — to an extent.

Listening in: In their study, currently online as a preprint, the team used an open source algorithm previously created at FAIR to analyze already-existing data sets, King wrote in Meta AI’s blog.

Those datasets contain the brain recordings of 169 healthy volunteers, taken as they were listening to audiobooks in Dutch and English, over 150 hours worth.

Because the aim is decoding speech noninvasively, the team used data recorded by measuring the brain’s electrical activity — electroencephalography, or EEG — and magnetic activity, known as magnetoencephalography, or MEG.

Both are recorded via sensors on the outside of the skull, which constituted one of the researcher’s main challenges, King told TIME: “noisy” data limited by the distance of sensors from the brain, and the impacts of skin, skull, water, etc., on the signals.

All that noise is made even more difficult to cut through because, well, we’re not 100% sure what we’re looking for.

“The other big problem is more conceptual in that we actually don’t know how the brain represents language to a large extent,” King said.

Using both audiobook and brain recordings, the AI analyzed them to spot patterns between words heard and brainwaves.

This is the problem with decoding speech that the team wants to outsource to the AI, as it jogs brain activity with an action — in this case, what a subject is hearing.

Without the AI, it “would be very difficult to say, ‘OK, this brain activity means this word, or this phoneme, or an intent to act, or whatever,’” King said.

Decoding speech: After chopping up those hours into three second bits, they fed both the audiobook and brain recordings to the AI, which analyzed them to spot patterns.

The team kept back 10% of the data to test their model, New Scientist reported: using the patterns learned from the other 90% to try and identify the words heard in brain recordings it had never seen.

“After training, our system performs what’s known as zero-shot classification: Given a snippet of brain activity, it can determine from a large pool of new audio clips which one the person actually heard,” King wrote in the Meta blog. “From there, the algorithm infers the words the person has most likely heard.”

Specifically, the AI leaned on its vocabulary of 793 words to make ten word lists of its best guesses, New Scientist reported, roughly decoding speech.

According to their preprint, the AI was capable of getting the right word in the top ten 72.5% of the time when using three seconds of MEG data — hitting it first guess 44% of the time — and 19.1% for EEG data.

The AI was capable of including the correct word in its offered list of answers up to 72.5% of the time.

What’s next: Imperial College London professor Thomas Knopfel told New Scientist that the system will need more refinement before it could be practically useful for decoding speech, and is skeptical that EEG and MEG — being noninvasive — could ever provide the granular detail needed for more accuracy.

“It’s about information flow,” Knopfel told New Scientist. “It’s like trying to stream an HD movie over old-fashioned analogue telephone modems. Even in ideal conditions, with someone sitting in a dark room with headphones on, just listening, there are other things going on in the brain. In the real world it becomes totally impossible.”

However, it’s possible that technological advances could change that: a newer form of MEG called OPM is pushing the envelope of what can be learned from outside.

For his part, King told TIME they are currently only decoding speech insofar as telling what people have heard in the scanner; it’s not meant to be for designing products yet, but only as basic research and proof of principle.We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].