Medical Innovations

Brain waves become spoken words in AI breakthrough for paralysis

Brain waves become spoken words in AI breakthrough for paralysis
Researchers from UC Berkeley and UC San Francisco connect their subject Ann's brain implant to the voice synthesizer computer
Researchers from UC Berkeley and UC San Francisco connect their subject Ann's brain implant to the voice synthesizer computer
View 1 Image
Researchers from UC Berkeley and UC San Francisco connect their subject Ann's brain implant to the voice synthesizer computer
1/1
Researchers from UC Berkeley and UC San Francisco connect their subject Ann's brain implant to the voice synthesizer computer

California-based researchers have developed an AI-powered system to restore natural speech for paralyzed people in real time and using their own voices.

This new technology from researchers at University of California Berkeley and University of California San Francisco, takes advantage of devices that can tap into the brain to measure neural activity, along with AI that actually learns how to build the sounds of a patient's voice.

That's far ahead of advancements as recent as last year in the field of brain-computer interfaces for synthesizing speech.

"Our streaming approach brings the same rapid speech decoding capacity of devices like Alexa and Siri to neuroprostheses," explained Gopala Anumanchipalli, an assistant professor of electrical engineering and computer sciences at UC Berkeley and co-principal investigator of the study that appeared this week in Nature Neuroscience. "Using a similar type of algorithm, we found that we could decode neural data and, for the first time, enable near-synchronous voice streaming. The result is more naturalistic, fluent speech synthesis."

What's neat about this tech is that it can work effectively with a range of brain sensing interfaces. That includes high-density electrode arrays that record neural activity directly from the brain surface (like the setup the researchers used), as well as microelectrodes that penetrate the brain’s surface, and non-invasive Surface Electromyography (sEMG) sensors on the face to measure muscle activity.

Here's how it works. First, the neuroprosthesis fitted to the patient samples neural data from their brain's motor cortex, which controls speech production. AI then decodes that data into speech. Cheol Jun Cho, who co-authored the paper, explained, "... what we’re decoding is after a thought has happened, after we’ve decided what to say, after we’ve decided what words to use and how to move our vocal-tract muscles."

That AI was trained on brain function data captured from the patient silently attempting to speak the words that appeared on a screen in front of them. This allowed the team to map the neural activity and the words they were trying to say.

In addition, a text-to-speech model – developed using the patient's own voice before they were injured and paralyzed – generates the audio that you can hear from the patient 'speaking.'

A streaming brain-to-voice neuroprosthesis to restore naturalistic communication

In the proof-of-concept demonstration above, it appears the resulting speech isn't entirely perfect or completely naturally paced, but it's darn close. The system begins decoding brain signals and outputting speech within a second of the patient attempting to speak; that's down from 8 seconds in a previous study the team conducted in 2023.

This could greatly improve the quality of life for people with paralysis and similar debilitating conditions like ALS, by helping them communicate everything from their day-to-day needs to their complex thoughts, and connect with loved ones more naturally.

The researcher's next steps will see them speed up the AI's processing for generating speech more quickly, and explore ways to make the output voice more expressive.

Source: UC Berkeley Engineering

3 comments
3 comments
Username
They seem to get the words right so why don't they use one of ne many text to speech software that sound natural?
anthony88
Am I right in that it also captured voice inflexion for questions and statements? If so, it says a lot about what they have achieved and for how the mind processes language.
c w
@username - using text to speech synths wouldn't be have the users voice, no? Having the sound of the user's one voice seems key in thisproject.