A lot has changed with speech recognition software since that time we published a whole article written entirely by 2009's hottest speech-to-text engine, Dragon NaturallySpeaking (it did a pretty good job, all things considered). Google Now, Cortana and Siri have placed this technology in the hands of millions and put our vocal chords at the helm of all kinds of smartphone functionality, but research shows we still have our reservations. A new study has indicated that these systems can not only pump out text messages much faster than we can, but do so with better accuracy, suggesting perhaps it might not hurt to raise our voices every now and then when arranging our weekend plans.
"Speech recognition is something that's been promised to us for decades, but it has never worked very well," said James Landay, a professor of computer science at Stanford University and co-author of the new study. "But we were noticing that in the past two to three years, speech recognition was actually improving a lot, benefiting from big data and deep learning to train its neural networks to produce faster, more accurate results. So we decided to formally test it against humans."
Landay and colleagues from Stanford and the University of Washington set up an experiment which placed 32 texters, aged 19 to 32, head-to-head with the deep-learning speech recognition software Deep Speech 2 from Baidu, Google's search rival in China. With half of these subjects typing in English on a QWERTY keyboard and the other half in their native Mandirin using iOS' Pinyin keyboard, they were made to type or speak more than 100 typical phrases like "have a good weekend" and "go out for some pizza and beer."
The researchers say that in both languages, the speech-to-text proved much faster than physically typing out text messages. It proved three times faster than typing in English and 2.8 times faster than typing in Mandarin. This isn't overly surprising – the box of Dragon NaturallySpeaking in 2009 claimed the technology was three times faster than typing, albeit on computer keyboard rather than a fiddly smartphone touchscreen. But what good is laying down words quickly if they those words aren't even the ones we were aiming for?
Well, the study also indicated that the technology was far more accurate than its fumbling, fingered competitors. The error rate in English was 20.4 percent lower, while in Mandarin it was a considerable 63.4 percent lower than typing. While the researchers only used Baidu's Deep Speech 2 in the testing, they believe that other high-grade speech-to-text engines would work at a similar level, something they hope will inspire engineers to make better use of the technology, beyond the realms of text messaging and emails.
"We should put speech in more applications than just typing an email or text message," says Landay. "You could imagine an interface where you use speech to start and then it switches to a graphical interface that you can touch and control with your finger."
The team's research can be read online courtesy of Stanford, and you can hear from Landay in the video below.
Source: Stanford University