Cambridge University PhD student Qi Pan has designed software that creates textured 3D models in slightly more than a minute using a stationary camera, such as a webcam. Conventional off-line model reconstruction relies on a number of phases - there’s an image collection phase that can be quite quick, followed by a very slow reconstruction phase, which requires a long time to verify a model obtained from an image sequence is acceptable. This new software creates a 3D model on-line as the input sequence is being collected. As the user rotates the object in front of a camera, a partial model is reconstructed and displayed to the user.
While many automated and semi-automated 3D reconstruction techniques exist, current methods have a variety of drawbacks meaning that ultimately many high quality 3D models are still generated using labor-intensive methods, or completely by hand.
Pan’s system, ProFORMA (Probabilistic Feature-based On-line Rapid Model Acquisition), enables textured 3D models to be created on-line in just over a minute, with models being reconstructed fast enough to provide feedback for view planning. The user is free to interact with the object which is robustly tracked using a method that makes immediate use of the partial model.
Models are produced rapidly through a Delaunay tetrahedralization of points obtained from on-line structure from motion estimation, followed by a probabilistic tetrahedron carving step to obtain a textured surface mesh of the object. The model is also used by the system to robustly track the pose of the object.
The reconstruction environment doesn’t have to take place inside a studio, which is common practice, and enables models of textured objects to be captured using a single camera and commodity hardware.
Pan, a student in Cambridge University’s engineering department says: “We implement a non-line reconstruction system with a fixed position video camera. The user can move a textured object in front of the camera using their hand. This enables the object to be orientated so that all parts of the object (including the base) can be viewed and modeled.
“No assumptions about the object are made and no prior information is known about the object, although the object must be sufficiently textured. A partial model is built as the user moves the object around, providing immediate feedback to the user about the state of the reconstruction.”
When tracking the object to be modeled, it is very advantageous to use a video sequence rather than an image sequence as this takes advantage of small inter-frame motions (at high frame rates). In terms of reconstruction, however, temporally close and thus spatially close video frames provide very little 3D information, whilst adding to the reconstruction cost (which grows with the number of frames used).
“Therefore, we opt for a two-threaded key frame-based approach, with separate tracking and reconstruction threads. The tracking thread takes a video input of the scene and calculates the pose of the camera relative to the object at frame rate.
“It also tracks the location of 2D image features which adhere to rigid body motion constraints. When a large enough rotation is detected, a key frame is taken and passed to the reconstruction thread, which generates a partial 3D model of the object. This model is fed back into the tracker to provide additional structural information that can be used during tracking,” said Pan.
He added that the model was also utilized by the visualization system to show the user the state of the reconstruction so far, with the model pose updated to the live pose of the object obtained from the tracking thread. The user provides feedback to the system by looking at the visualization and manipulating the object accordingly to provide new views.
The video below shows exactly what Pan has achieved and the ease at which his software creates 3D models with interaction by the user.