Automatic 3D Model Construction for Turn-Table Sequences
Andrew W. Fitzgibbon, Geoff Cross and Andrew Zisserman
Download PostScript
{awf,geoff,az}@robots.ox.ac.uk
Robotics Research Group,
Department of Engineering Science,
University of Oxford,
19 Parks Road, Oxford OX1 3PJ, United Kingdom


The Input:
Any sequence of images of a rotating object with a constant-colour
background.
All we need to know is that
the object is rotating about a single axis.


|
|
|
- Fully automatic procedure: Converting the images to 3D
models is a black-box filter: Video in, VRML out.
- We don't require that the motion be regular: the angle between views
can vary, and it doesn't have to be known. Recovery of the angle is
automatic, and accuracy is about 40 millidegrees standard deviation. In
golfing terms, that's an even chance of a hole in one.
- We don't use any calibration targets: features on the objects
themselves are used to determine where the camera is, relative to the
turntable. Aside from being easier, this means that there is no problem
with the setup changing between calibration and acquisition, and that
anyone can use the software without special equipment.
For example, this dinosaur
sequence was supplied to us by the University of Hannover without any
other information. (Actually, we do have the ground-truth angles so that
we can make the accuracy claims above, but of course these are not used in
the reconstruction).
|


The Output:
VRML model of the input object (plus camera position and turntable
angles)


Click on the pictures to get a larger image.
Texturemapped (34K polygons):
Download VRML:
- texturemapped, 35K polygons, gzipped, 541K
- high resolution, 135K polygons, gzipped, 1.6MB
- very high res, hand only, 40K polygons, gzipped, 398K
Resolution: Spatial resolution is dependent on the resolution of
the camera -- if you can see a feature in the images, you should see it on
the model. This high-resolution hand model is overlaid on one of the
original images to show the subpixel accuracy of the volume intersection.
Limitations: There are two types: fundamental and other.
The fundamental ones are:
- To use object features for camera calibration, the object's surface
must have some texture. The technique works poorly on completely smooth
objects (but see the paper for an example on an almost textureless object).
- If the camera aspect ratio is unknown (for example, if using an old
video camera), the model will be thinner at the top than at the bottom.
This is fixed using an interactive tool that allows manual rescaling.
The non-fundamental limitations are to do with the bluescreening and volume
intersection processes. This means that it's "just a matter of
programming" before they are fixed.
- A bluescreen should not be needed: the volume intersection technique
used is very simple, and will not get "inside" a complex model (see the
hand VRML above), but we are developing a much more powerful
correlation-based modeler.
- The texture mapping needs colour correction and super-resolution
rather than just assigning to each triangle the closest image.


BibTeX entry
@InProceedings{Fitzgibbon98,
author = {Andrew W. Fitzgibbon, Geoff Cross and Andrew Zisserman},
title = {Automatic 3D Model Construction for Turn-Table Sequences},
booktitle = {Proceedings of SMILE Workshop on Structure from Multiple Images in Large Scale Environments},
publisher = {Springer Verlag},
series = {Lecture Notes in Computer Science},
volume = {1506},
year = {1998},
editor = {R. Koch and L. Van{G}ool},
pages = {154-170},
month = {June}
}