These days, with most people toting camera-packing smartphones, friends and families act as a veritable film crew, ready to capture important moments from a multitude of angles. But editing the footage into a cohesive whole can be a time-consuming chore. Now a team at Disney Research has developed an algorithm that automatically edits hours of raw footage into something less tedious to sit through.

Unlike software such as Magisto, Highlight Hunter and LiveLight, which help editors sort the video wheat from the chaff captured by a single camera, the algorithm developed at Disney Research combines footage of a single event captured from different points of view by different cameras. It does this by deducing what event is the most significant based on what the various cameras are focused on.

"Though each individual has a different view of the event, everyone is typically looking at, and therefore recording, the same activity – the most interesting activity," says Yaser Sheikh, an associate research professor of robotics at Carnegie Mellon University and part of the team at Disney Research Pittsburgh. "By determining the orientation of each camera, we can calculate the gaze concurrence, or 3D joint attention, of the group."

Other approaches for automatically or semi-automatically combining footage from multiple cameras generally rely on selecting the most stable or best lit footage and periodically switching between the camera angles available. But because the algorithm developed at Disney Research calculates the spatial relationship between the subject(s) and the various cameras, it is also able to adhere to established cinematographic guidelines.

These include the 180-degree rule that says the camera needs to stay on one side of the axis that connects subjects in a scene. For example, if there two people in a scene, the axis would be an imaginary line connecting them. Changing the camera angle from one side of this line to the other would be called jumping or crossing the line and confuse the viewer.

The system will also avoid shots of only a very short duration, which can be jarring to the viewer, and jump cuts, which are cuts from one shot to another that vary only slightly in terms of perspective and either give the viewer the impression of a jump forward in time, or jumpy camerawork.

Although the system takes several hours to carry out the computations necessary to put together a cohesive video lasting a few minutes, the Disney Research team says professional editors using the same raw footage took on average more than 20 hours to achieve similar results.

"The resulting videos might not have the same narrative or technical complexity that a human editor could achieve, but they capture the essential action and, in our experiments, were often similar in spirit to those produced by professionals," says Ariel Shamir, an associate professor of computer science at the Interdisciplinary Center, Herzliya, Israel, and a member of the Disney Research Pittsburgh team.

While the algorithm may not replace professional editors, its creators say it could assist them in editing large amounts of footage.

The Disney Research Pittsburgh team will present a paper (PDF) on their algorithm at ACM SIGGRAPH 2014, which that is currently underway in Vancouver, Canada.

The video below demonstrates how the algorithm works.

View gallery - 2 images