While great strides have been made in the development of humanoid robots, such as Honda's ASIMO, giving robots a human face with natural expressions and movement has proven a difficult task. While some look to create lifelike faces and expressions with motors under artificial skin replicating the function of facial muscles, German and Japanese researchers have joined forces to come up with a different solution called Mask-bot that sees a 3D image of a human face projected onto the back of a plastic mask.
The Mask-bot displays realistic three-dimensional heads using a projector positioned behind a transparent plastic mask. The projector beams a human face onto the back of the mask to create realistic features that can not only be seen from various angles, including the side, but can also be changed on demand.
Dr. Takaaki Kuratate compares the Mask-bot approach to that used to project faces onto sculptures in Disneyland's Haunted Mansion ride. However, these images are projected from the front whereas the Mask-bot uses rear-projection. This means there is only a 12 cm (4.7 in) gap between the high-compression, x0.25 fish-eye lens with a macro adapter used to project the image and the face mask.
To ensure the projected image was also bright enough to be viewed in daylight, the team used a projector that is strong and small and they also gave the inside of the plastic mask with a coat of luminous paint.
In developing Mask-bot the team also faced the challenge of projecting a moving image onto the mask instead of just a static photo without requiring a video image of the person speaking. To achieve this they use a program that converts a normal two-dimensional photo into a correctly proportioned projection for the three-dimensional mask. Additional algorithms are then used to provide the facial expressions and voice.
The talking head animation engine developed by Takaaki Kuratate to replicate facial expressions filters an extensive series of face motion data from people collected by a motion capture system. It then selects the facial expressions that best match the specific sound - or phoneme - when it is being spoken. The computer extracts a set of facial coordinates from each of the selected expressions that is can then assign to any new face. Emotion synthesis software is responsible for delivering the visible emotional nuances, to indicate when someone is happy or sad, for example.
Mask-bot's current understanding of the spoken word is limited to listening and making appropriate responses as part of a fixed programming sequence. It can also realistically reproduce content typed on a keyboard in English, Japanese, and soon German. A text-to-speech system converts the text to audio signals, producing a male or female voice, which can then be set to quiet or loud, happy or sad, at the touch of a button.
While the researchers say the technology used in Mask-bot may one day give robots a human face, they anticipate it could appear sooner in avatars for video conference participants.
"Usually, participants are shown on screen. With Mask-bot, however, you can create a realistic replica of a person that actually sits and speaks with you at the conference table. You can use a generic mask for male and female, or you can provide a custom-made mask for each person," explains Takaaki Kuratate, who says such systems could also be used to provide companionship for older people.
The researchers are already working on Mask-bot 2, in which they aim to see the mask, projector and computer control system all contained inside a mobile robot. Although the first Mask-bot prototype cost just under EUR3,000 (approx. US$4,125), they estimate the successor model should cost around EUR400 (approx. US$550).
Mask-bot was created through a collaboration of the Technical University of Munich's (TUM) Cognition for Technical Systems (CoTeSys) Cluster of Excellence and Japan's National Institute of Advanced Industrial Science and Technology (AIST).
Here's some video of Mask-bot in action.