I am what I am, I’m Popeye the audio-visual robot
The ease with which human beings make sense of their environment through a range of sensory signals belies the complex processing involved. Approaches to give robots the same purposeful perception we take for granted have typically involved studying visual and auditory processes independently. By combining data from both sound and vision European researchers have developed technology that could facilitate robotic understanding and responses to human behavior and even conversations, bringing us closer to a future where humanoid robots can act as guides, mix with people, or use perception to infer appropriate actions.
Although the team from the Perception On Purpose (POP) project encountered difficulties in attempting to integrate two different sensory modalities, namely sound and vision, it found that combining the two senses helped overcome limitations of both. Vision allows the observer to infer certain properties, such as size, shape, density and texture, whereas sound is used to locate the direction of the source, and identify what type of sound it is. On its own, a sound source is difficult to pinpoint because it needs to be located in a 3D space, and there is also the problem of background noise to contend with. By combining visual and auditory data the researchers found it was much easier for a robot to decide what is foreground and what is background.
The team managed to integrate all the technology required, including two microphones and two cameras, into the head of its Popeye robot, resulting in a neat and compact robotic platform. Using this approach with the algorithms the team developed, its robot, called Popeye, was able to identify a speaker with a fair degree of reliability.
POP’s coordinator, Radu Horaud, feels that some modern uses of artificial intelligence (AI), like chess applications, are limited because they do not learn from their environment. They are programmed with abstract data – say, chess moves – and they process that.
“They cannot infer predicates from natural images; they cannot draw abstract information from physical observations,” he stresses.
For now, POP has achieved many of its aims and commercial applications for this type of technology are not out of the question. The researchers also hope to continue their work in a further project that would look at extending some of POP’s results into a functioning humanoid robot. In the meantime, POP’s work means that the purposefully perceptive robot has become a not-so-distant future technology.