Even when using a hearing aid, it can be quite difficult for deaf people to make out specific individuals' voices in noisy environments. The new SpeakerBeam system could help, by automatically recognizing and boosting a select person's voice.
In crowded settings such as social gatherings, conventional hearing aids simply amplify all of the sound in the room. As a result, the boosted voice of any one person is drowned out by the boosted voices of everyone else. This phenomenon is known as the "cocktail party effect."
Some technologies – which can be implemented into existing hearing aids – address the problem by isolating and amplifying the voice of the individual located directly in front of the hearing aid user. These systems do work, but only as long as the speaker and the deaf person remain in a face-to-face orientation.
Developed by scientists at Japan's NTT Corporation, SpeakerBeam takes a different approach.
It utilizes two neural networks, one of which initially has to be trained on a 10-second recording of the speaker's voice. That network analyzes the recording – which is known as an "adaptation utterance" – determining the exact qualities of the voice that make it unique.
In a subsequent cocktail-party-like environment, the other neural network uses that voice-signature data to pick out the speaker's voice from those of other people located nearby. It then amplifies only that voice, and continues to do so even if the speaker and the user turn away from one another.
Of course, one drawback of SpeakerBeam lies in the fact that it does require that initial adaptation utterance. This means it wouldn't work in scenarios in which the speaker hadn't already provided a voice sample.
It also sometimes gets confused when two people with similar voices are speaking at the same time. The scientists are working on addressing this problem by refining the voice recognition algorithm, and by identifying the direction from which each voice is coming.
Source: NTT