Disney's speech recognition system for kids cuts through the chatter
Barking voice commands at a phone, car, computer, or a dedicated voice assistant like Alexa, is pretty commonplace these days, but these systems are usually designed with an adult manner of speaking in mind. Kids have very different speech patterns, and Disney Research has developed a system that caters to a younger crowd, picking out key words from excited chatter and overlapping speech to let kids play a video game with their voice.
Mole Madness is the name of the game, and kids control the character with just two simple voice commands. Playing in pairs (either with another child or a robot named Sammy), one player says "go" to get the mole moving across the screen, while their partner steers it upwards by saying "jump". As simple as that seems for a speech recognition system to identify, the kids threw a few spanners in the works with a tendency to chitchat and talk over each other.
"Kids don't necessarily pronounce words quite like adults and when they are playing together, as they like to do, they often engage in side banter, or exclamations of excitement, or simply talk over each other," says Jill Fain Lehman, lead researcher on the project. "That makes it tough for a speech-based system, even one that just has to detect the words 'go' and 'jump' as in Mole Madness."
At first, the voice recognition system had some human training wheels, in the form of a "wizard" in another room who would press a button on a controller when he heard either a "jump" or "go" command. After 62 children between the ages of five and 10 had played the game, the researchers had enough data to train the system to recognize those key words, both individually or said together, and differentiate them from background noise and other bits of banter.
Once the system was automated and the wizard removed from the equation, the researchers reported it could pick out the keywords 85 percent of the time. Not bad, considering 40 percent of those commands overlapped when two kids were playing, and 32 percent were said faster than usual.
By comparison, a commercially available speech recognition system was put through the paces and could only recognize 50 percent of the commands, and struggled with overlapping and fast speech.
The automated system was also judged to be more engaging to players than when a researcher was entering the commands manually. According to parents watching the video later, the kids were closer to feeling like they "could take it or leave it" than solidly enjoying themselves. Once the game was automated, and could process a command within 150 milliseconds, the kids showed more signs of engagement with the game.
The system did run into some issues with the "jump" command when a new group of kids, some as young as four, tried out the game, but the researchers found that the participants soon modified their speech patterns to help the system through, repeating the commands or saying them more carefully.
"Speech recognition applications have become increasingly commonplace as the technology has matured, but understanding what kids say when they play remains difficult," says Jessica Hodgins, vice president at Disney Research. "This latest work by our researchers could make it possible to design any number of speech-based game or entertainment applications for children, including interactions with robots."
The researchers are presenting the study at the Workshop on Child Computer Interaction this week, and at the International Conference on Intelligent Virtual Agents later this month.