We’ve all seen footage of flight crews on the decks of aircraft carriers, directing taxiing planes using arm signals. That’s all very well and good when they’re communicating with human pilots, but what happens as more and more human-piloted military aircraft are replaced with autonomous drones? Well, if researchers at MIT are successful in one of their latest projects, not much should change. They’re currently devising a system that would allow robotic aircraft to understand human arm gestures.

The MIT team divided the project into two parts. The first involved getting the system to identify body poses within “noisy” digital images, while the second was concerned with identifying specific gestures within a series of movements – those deck crews don’t stay still for very long.

A stereoscopic camera was used to record a number of videos for the study, in which several different people demonstrated a total of 24 gestures used commonly on aircraft carrier runways. While a device like the Microsoft Kinect could now pick out the body poses in that footage reasonably well, such technology wasn’t around at the time the study began. Instead, a system was created that picked out the positions of the subjects’ elbows and wrists, noted whether their hands were open or closed, and if the thumbs of those hands were up or down.

What the researchers are focusing on now is a way of sifting through all those continuous back-to-back poses, and isolating the different gestures for identification by the drones. It would take too long and require too much processing to retroactively analyze thousands of frames of video, so instead the system breaks the footage up into sequences about three seconds (or about 60 frames) in length. Because one gesture might not be fully contained within any one of those sequences, the sequences overlap one another – frames from the end of one sequence are also included in the beginning of the next.

The system starts by analyzing the person’s body pose in each frame. It then cross-references that pose with each of the 24 possible gestures, and uses an algorithm to calculate which gesture is most likely being made. This estimation process is then applied to the string of poses that make up the whole sequence, and then to several successive sequences.

So far, in identifying gestures from the video database, it’s managed an accuracy rate of about 76 percent. However, the researchers are confident that by refining the algorithms, that rate could be vastly improved.

More details are available in the video below.

Source: MIT