Computers

Oxford's lip-reading AI outperforms humans

Oxford's lip-reading AI outperforms humans
Some say only 30 percent of all speech is visible on the lips
Some say only 30 percent of all speech is visible on the lips
View 1 Image
Some say only 30 percent of all speech is visible on the lips
1/1
Some say only 30 percent of all speech is visible on the lips

Lip-reading is an inexact science, with motoring mouths making it hard to attribute sounds to each individual movement. Computer scientists at Oxford University have teamed up with Google's DeepMind to develop artificial intelligence that might give the hearing impaired a helping hand, with their so-called Watch, Attend and Spell (WAS) software outperforming a lip-reading expert in early testing.

The figures on lip-reading accuracy do vary, but one thing's for certain: it is far from a perfect way of interpreting speech. In an earlier paper, Oxford computer scientists reported that on average, hearing-impaired lip-readers can achieve 52.3 percent accuracy. Meanwhile, Georgia Tech researchers say that only 30 percent of all speech is visible on the lips.

Whatever the case, software that can automate the task and/or boost its accuracy could have a big impact on the lives of the hearing impaired. It is with this is mind that the Oxford team collaborated with DeepMind, the artificial intelligence company acquired by Google in 2014, to develop a system that can bring better results.

The researchers did this by using computer vision and machine learning to train WAS on more than 5,000 hours of TV footage from the BBC. The videos included more than 118,000 sentences and a vocabulary of 17,500 words spoken by more than 1,000 different people.

They then put WAS to the test alongside a human expert lip-reader, tasking the pair with working out what was being said in a silent video using only the person's mouth movements. The human correctly read 12 percent of the words, while WAS interpreted 50 percent of the words correctly. It did make some errors, but the team says these were minor, including mishaps like missing an 's' at the end of a word or misspelling a word by a single letter.

While there is a ways to go before the technology is put into practice, the researchers tell the BBC that the aim is to get it to work in real-time and such a feat is feasible – so long as they keep training it on TV footage, it will learn.

"AI lip-reading technology would be able to enhance the accuracy and speed of speech-to-text especially in noisy environments and we encourage further research in this area and look forward to seeing new advances being made," said Jesal Vishnuram, Technology Research Manager of the British charity Action on Hearing Loss.

The research paper describing the system can be accessed here.

Source: Oxford University

3 comments
3 comments
Bob Flint
This could be a help to some people, but the other side of things could be intrusions to our privacy....time to brush up on my ventriloquist skills...
LarryWolf
I wish they would focus on ASL and make a darn computer capable of reading any deaf persons or interpreters signs and quickly convert that to typed text. But noooooooo.
GeoffSykes
How did this article get published without a HAL 9000 thumbnail or mention? I am disappointed.