Motion estimation is a very important problem in dynamic scene analysis. Although it is easier to estimate motion parameters from 3D data than from 2D images, it is not trivial since the 3D data we have are almost always corrupted by noise. This article presents a comparative study on motion estimation from 3D line segments. Two representations of line segments and two representations of rotation are described. With different representations of line segments and rotation, a number of methods for motion estimation are presented, including the Extended Kalman Filter, a general Minimization process and the Singular Value Decomposition. These methods are compared using both synthetic and real data obtained by a trinocular stereo. We observe that the Extended Kalman Filter with the rotation axis representation of rotation is preferable. We note that all methods discussed in this article can be directly applied to 3D point data.
Keywords: Motion Estimation, Motion from Stereo, Noisy System, Nonlinear System, Minimization.
In this paper, we describe a method for computing the movement of objects as well as that of a mobile robot from a sequence of stereo frames. Stereo frames are obtained at different instants by a stereo rig, when the mobile robot navigates in an unknown environment possibly containing some moving rigid objects. An approach based on rigidity constraints is presented for registering two stereo frames. We demonstrate how the uncertainty of measurements can be integrated with the formalism of the rigidity constraints. A new technique is described to match very noisy segments. The influence of egomotion on observed movements of objects is discussed in detail. Egomotion is first determined and then eliminated before determination of the motion of objects. The proposed algorithm is completely automatic. Experimental results are provided. Some remarks conclude this paper.
Keywords: Motion from Stereo, Egomotion, Multiple Object Motions, Mobile Robot, 3D Matching, Rigidity Constraints, Uncertainty of Measurements.
We present a method for estimating 3D displacements from two stereo frames. It is based upon the hypothesize-and-verify paradigm which is used to match 3D line segments between the two frames. In order to reduce the complexity of the method, we make the assumption that objects are rigid. We formulate a set of complete rigidity constraints for 3D line segments and integrate the uncertainty of measurements in this formulation. The hypothesize and verify stages of the method use an Extended Kalman Filter to produce estimates of the displacements and of their uncertainty. In the experimental sections, the algorithm is shown to work on indoor and natural scenes. Furthermore it is easily extended, as also shown, to the case where several mobile objects are present. The method is quite robust, fast, and has been thoroughly tested on hundreds of real stereo frames.
Keywords: Motion from stereo, 3D matching, rigidity constraints, uncertainty, hypothesize-and-verify, multiple object motions, extended Kalman filtering, robot vision.
We address the problem of computing the three-dimensional motions of objects in a long sequence of stereo frames. Our approach is bottom-up and consists of two levels. The first level deals with the tracking of 3D tokens from frame to frame and the estimation of their kinematics. The processing is completely parallel for each token. The second level groups tokens into objects based on their kinematic parameters, controls the processing at the low level to cope with problems such as occlusion, disappearances and appearances of tokens, and provides information to other components of the system. We have implemented this approach using 3D line segments obtained from stereo as the tokens. We use classical kinematics and derive closed-form solutions for some special, but useful, cases of motions. The motion computation problem is then formulated as a tracking problem in order to apply the extended Kalman filter. The tracking is performed by feedforward computation in a prediction-matching-update loop in which multiple matches can be handled. The individual line segments can be grouped into objects according to the similarity of their kinematic parameters. Experiments using synthetic and real data have been carried out and good results can be observed.
Keywords: Image Sequence Analysis, 3D Motion Tracking and Computation, Kinematic Model, 3D Token tracker, Multiple Object Motions, Grouping, 3D Vision.
This article describes a system to incrementally build a world model with a mobile robot in an unknown environment. The model is, for the moment, segment-based. A trinocular stereo system is used to build a local map about the environment. A global map is obtained by integrating a sequence of stereo frames taken when the robot navigates in the environment. The emphasis of this article is on the representation of the uncertainty of 3D segments from stereo and on the integration of segments from multiple views. The proposed representation is simple and very convenient to characterize the uncertainty of segments. A Kalman filter is used to merge line segments matched. An important characteristic of our integration strategy is that a segment observed by the stereo system corresponds only to one part of the segment in space, so the union of the different observations gives a better estimate on the segment in space. We have succeeded in integrating 35 stereo frames taken in our robot room.
Keywords: Uncertainty Representation, Multiple View Integration, World Model Builder, 3D Vision, Mobile Robot.
This article describes the work at INRIA on obstacle avoidance and trajectory planning for a mobile robot using stereovision. Our mobile robot is equiped with a trinocular vision system which is being put into hardware and will be capable of delivering 3D maps of the environment at rates between 1 and 5 Hz. Those 3D maps contain line segments extracted from the images and reconstructed in three dimensions. They are used for a variety of tasks including obstacle avoidance and trajectory planning.
For those two tasks, we project on the ground floor the 3D line segments to obtain a two-dimensional map, we simplify the map according to some simple geometric criteria, and use the remaining 2D segments to construct a tesselation, more precisely a triangulation, of the ground floor. This tesselation has several advantages:
We show a variety of real examples in which our robot navigates freely in real indoors environments using this system.
A heuristic method has been developed for registering two sets of 3-D curves obtained by using an edge-based stereo system, or two dense 3-D maps obtained by using a correlation-based stereo system. Geometric matching in general is a difficult unsolved problem in computer vision. Fortunately, in many practical applications, some a priori knowledge exists which considerably simplifies the problem. In visual navigation, for example, the motion between successive positions is usually approximately known. From this initial estimate, our algorithm computes observer motion with very good precision, which is required for environment modeling (e.g., building a Digital Elevation Map). Objects are represented by a set of 3-D points, which are considered as the samples of a surface. No constraint is imposed on the form of the objects. The proposed algorithm is based on iteratively matching points in one set to the closest points in the other. A statistical method based on the distance distribution is used to deal with outliers, occlusion, appearance and disappearance, which allows us to do subset-subset matching. A least-squares technique is used to estimate 3-D motion from the point correspondences, which reduces the average distance between points in the two sets. Both synthetic and real data have been used to test the algorithm, and the results show that it is efficient and robust, and yields an accurate motion estimate.
Keywords: Free-Form Curve and Surface Matching, 3-D Registration, Motion Estimation, Dynamic Scene Analysis, 3-D Vision.
The statistical data association technique is an important approach to analyze long sequences of images in Computer Vision. Although it has extensively been studied in other domains such as in radar imagery, it was introduced only recently in Computer Vision, and is already recognized as an efficient approach to solving correspondence and motion problems. This paper has two purposes. The first is to present a general formulation of token tracking. The parameterization of tokens is not addressed. This might be useful to those who are not familiar with statistical tracking techniques. The second is to introduce some strategies for tracking with emphasis on practical importance. They include beam search for resolving multiple matches, support of existence for discarding false matches, and locking on reliable tokens and maximizing local rigidity for handling combinatorial explosion. We have implemented those strategies in a 3D line segment tracking algorithm and found them very useful.
Keywords: Token Tracking, Matching, Cluttered Scenes, Search Strategies
Back to main publications
or Email me for a paper reprint
We address in this paper how to find clusters based on proximity and planar
facets based on coplanarity from 3D line segments obtained from stereo. The
proposed methods are efficient and have been tested with many real stereo
data. These procedures are indispensable in many applications including
scene interpretation, object modeling and object recognition. We show their
application to 3D motion determination. We have developed an algorithm based
on the hypothesize-and-verify paradigm to register two consecutive 3D frames
obtained from stereo and estimate their transformation/motion. By
grouping 3D line segments in each frame into clusters and planes, we can
reduce effectively the complexity of the hypothesis generation phase.
Keywords: Grouping, Line segments, Planes, Clusters, Uncertainty, Motion estimation.
Back to main publications
or Email me for a paper reprint
Une méthode a été développée pour recaler deux nuages de points 3D obtenus
en utilisant la stéréo par corrélation. Le recalage de deux ensembles de
primitives géométriques est un problème en général très difficile et
non résolu. Heureusement, dans beaucoup d'applications, des
connaissances a priori simplifient considérablement le problème. Par exemple, le
mouvement entre deux positions successives est généralement
soit petit soit approximativement connu. A partir de cette estimée
grossière, notre algorithme permet de calculer le mouvement avec une très
bonne précision, nécessaire à l'obtention d'un modèle satisfaisant de
l'environnement. Les objets observés sont représentés au moyen de nuages de
points 3D. Ces points sont considérés comme des échantillons d'une
surface. Aucune contrainte n'est a priori imposée sur la forme des objets.
L'algorithme proposé est basé sur une mise en correspondance
itérative des points d'une vue avec leurs plus proches voisins dans l'autre vue.
Une méthode statistique basée sur la distribution de distances
est utilisée pour éliminer les appariements aberrants.
Une technique de moindres carrés est utilisée pour estimer
le mouvement 3D à partir des correspondances de points. L'application de ce
mouvement réduit la distance moyenne entre les surfaces dans les deux
ensembles. Des données réelles ont été
utilisées pour tester cet algorithme. Les résultats montrent qu'il est
efficace et robuste, et qu'il donne une estimation précise du mouvement.
Keywords: Mise en correspondance de surfaces, Recalage 3D,
Estimation du mouvement, Analyse de scène dynamique, Vision pour la
robotique
Back to main publications
or Email me for a paper reprint
The problem of matching is one of the bottlenecks in computer
vision. We identify three categories of matching: stereovision, object
recognition, and image sequence analysis. This report tries to provide a
complet survey of the works reported in the literature, with emphasis on
matching between two (2D or 3D) images in a sequence.
Le problème de la mise en correspondance est l'un des
problèmes les plus difficiles en vision par ordinateur.
Nous identifions trois catégories de mise en correspondance :
stéréovision, reconnaissance d'objets, et analyse de
séquences d'images. Ce rapport vise à faire une revue
complète sur l'ensemble de travaux
dans la littérature avec une attention particulière
sur la mise en
correspondance entre deux images au sein d'une séquence,
bidimensionnelles ou tridimensionnelles.
Keywords:
Matching, Stereovision, Object Recognition, Image Sequence Analysis;
Mise en correspondance, Stéréovision, Reconnaissance d'objets,
Séquences d'images
Back to main publications
or Click here to get the full version
We describe an analytical method for recovering 3-D motion and structure of
four or more points from one motion of a stereo rig. The extrinsic parameters
are unknown. The motion of the stereo rig is also unknown. Because of the
exploitation of information redundancy, the approach gains over
the traditional ``{\em motion and structure from motion\/}" approach
in that less features and less motions are
required, and thus more robust estimation of motion and structure can be
obtained. Since the constraint on the rotation matrix is not fully exploited
in the analytical method, nonlinear minimization can be used to improve the
result. Both computer simulated data and real data are used to validate the
proposed algorithm, and very promising results are obtained.
Keywords: Stereovision, Structure from motion, Reconstruction, Calibration.
We address in this paper the problem of self-calibration and metric
reconstruction (up to a scale) from one unknown motion of an uncalibrated
stereo rig, assuming the coordinates of the principal point of each camera
are known (This assumption is not necessary if one more motion is
available). The epipolar constraint is first formulated for two uncalibrated
images. The problem then becomes one of estimating unknowns such that the
discrepancy from the epipolar constraint, in terms of distances between
points and their corresponding epipolar lines, is minimized. The
initialization of the unknowns is based on the work of Maybank, Luong and
Faugeras on self-calibration of a single moving camera, which requires to
solve a set of so-called Kruppa equations. Redundancy of the information
contained in a sequence of stereo images makes this method more robust than
using a sequence of monocular images. Real data have been used to test the
proposed method, and the results obtained are quite good.
Keywords: Camera Calibration, Stereovision, Reconstruction, Self-calibration.
This paper proposes a robust approach to image matching by exploiting the
only available geometric constraint, namely, the epipolar constraint. The
images are
uncalibrated, namely the motion between them and the camera parameters
are not known. Thus, the images can be taken by different cameras or
a single camera at different time instants. If we make an exhaustive search
for the epipolar geometry, the complexity is prohibitively high.
The idea underlying our approach is to use classical techniques
(correlation and relaxation methods in our
particular implementation) to find an initial set of matches, and then use a
robust technique--the Least Median of Squares (LMedS)---to discard
false matches in this set. The epipolar geometry can then be accurately
estimated using a meaningful image criterion. More matches are eventually found,
as in stereo matching, by using the recovered
epipolar geometry. A large number of experiments have been carried out, and
very good results have been obtained.
Regarding the relaxation technique, we define a new measure of matching
support, which allows a higher tolerance to deformation with respect to
rigid transformations in the image plane and a smaller contribution
for distant matches than for nearby ones.
A new strategy for updating matches is developed, which only selects
those matches having both high matching support and low matching ambiguity.
The update strategy is different from the classical ``winner-take-all'',
which is easily stuck at a local minimum, and also from
``looser-take-nothing'', which is usually very slow. The proposed algorithm
has been widely tested and works remarkably well in a scene with many
repetitive patterns.
Keywords:Robust Matching, Epipolar Geometry, Fundamental Matrix, Least
Median Squares (LMedS), Relaxation, Correlation.
We present in this paper an algorithm for determining 3D motion and
structure from correspondences of \emph{line segments} between two
perspective images. To our knowledge, this paper is the first
investigation of use of line segments in motion and structure from
motion. Classical methods use their geometric abstraction, namely
straight lines, but then three images are necessary for the motion and
structure determination process. In this paper we show that two views
are in general sufficient when we use line
segments. The assumption we use is that two matched line segments
contain the projection of a \emph{common part} of the corresponding
line segment in space. Indeed, this is what we use to match line
segments between different views. Both synthetic and real data have
been used to test the proposed algorithm, and excellent results have
been obtained with real data containing a relatively large set of line
segments. The results are comparable with those obtained using
calibrated stereo.
Keywords: Motion, Structure from Motion, Line Segments,
Epipolar Geometry, Overlap, Dynamic Scene Analysis.
We present a novel technique for calibrating a binocular stereo rig by
using the information from both scenes and classical calibration
objects. The calibration provided by the calssical methods is only valid
for the space near the position of the calibration object.
Our technique takes the advantage of the rigidity of
the geometry between two cameras.
The idea is to first estimate precisely the epipolar geometry
which is valid for a wide range in space from all available
matches. This allows to conduct a projective reconstruction. Using the
a priori knowledge of the calibration object, we are eventually able
to calibrate the stereo rig in a Euclidean space. The proposed
technique has been tested with a number of real images, and
significant improvement has been observed.
Also in Proceedings of Europe-China
Workshop on Geometrical modelling and
Invariants for Computer Vision, pages 253-260, April 1995, Xi'an, China.
Back to main publications
or Click here to get the full version
Here is the demonstration with experimental data
Back to main publications or Click here to get the full PS version or Click here to read the HTML version
Back to main publications or Click here to get the full PS version
Back to main publications
or
Click here to get the full PS version
Click here to get a copy of the software
SFM compiled for SUN.
Here is the demonstration
Keywords: Epipolar geometry, Fundamental matrix, Uncertainty, Covariance matrix, Epipolar band.
Back to main publications or Click here to get the full PS version
Keywords: Epipolar Geometry, Fundamental Matrix, Calibration, Reconstruction, Parameter Estimation, Robust Techniques, Uncertainty Characterization, Performance Evaluation, Software
Back to main publications
or
Click here to get the full PS version
Click here to get a copy of the softwares
FMatrix & Fdiff compiled for Solaris and Linux.
In order to achieve a 3D, either Euclidean or projective, reconstruction with
high precision, one has to consider lens distortion. In almost all work on
multiple-views problems in computer vision, a camera is modeled as a
pinhole. Lens distortion has usually been corrected off-line. This paper
intends to consider lens distortion as an integral part of a camera.
We first describe the epipolar geometry between two images with
lens distortion. For a point in one image, its corresponding point in the
other image should lie on a so-called epipolar curve. We then
investigate the possibility
of estimating the distortion parameters and the fundamental matrix based on
the generalized epipolar constraint. Experimental results with computer
simulation show that the distortion parameters can be estimated correctly if
the noise in image points is low and the lens distortion is severe. Otherwise,
it is better to treat the cameras as being distortion-free.
Keywords: Camera Calibration, Lens Distortion, Epipolar Geometry,
Fundamental Matrix, Epipolar Curve
Back to main publications
or
Click here to get the full PS version
The success of an intelligent robotic system depends on the
performance of its vision-system which in turn depends to a great
extend upon the quality of its calibration. During the execution of
a task the vision-system is subject to external influences such as
vibrations, thermal expansion etc. which affect and possibly render
invalid the initial calibration. Moreover, it is possible that the
parameters of the vision-system like e.g. the zoom or the focus are
altered intentionally in order to perform specific vision-tasks.
This paper describes a technique for automatically maintaining
calibration of stereovision systems over time without using again
any particular calibration apparatus. It uses all available
information, i.e. both spatial and temporal data. Uncertainty is
systematically manipulated and maintained. Synthetical and real data
are used to validate the proposed technique, and the results compare
very favourably with those given by classical calibration methods.
Keywords: Camera calibration, Calibration maintaining,
Dynamic vision, Pose determination, 3D vision.
Back to main publications
or
Click here to get the full PS version
This paper addresses the recovery of structure and motion from two
uncalibrated images of a scene under full perspective or under affine
projection. Epipolar geometry, projective reconstruction, and affine
reconstruction are elaborated in a way such that everyone having knowledge of
linear algebra can understand the discussion without difficulty. A general
expression of the fundamental matrix is derived which is valid for any
projection model without lens distortion (including full perspective and
affine camera). A new technique for affine reconstruction from two affine
images is developed, which consists in first estimating the affine epipolar
geometry and then performing a triangulation for each point match with respect
to an implicit common affine basis. This technique is very efficient.
Keywords: Motion Analysis, Epipolar Geometry, Uncalibrated Images,
Non-Metric Vision, 3D Reconstruction, Fundamental Matrix.
Back to main publications
or
Click here to get the full PS version
We present in this paper a system which automatically builds, from
real images, a scene model containing both 3D geometric information
of the scene structure and its photometric information under various
illumination conditions. The geometric structure is recovered from
images taken from distinct viewpoints. Structure-from-motion and
correlation-based stereo techniques are used to match pixels between
images of different viewpoints and to reconstruct the scene in 3D
space. The photometric property is extracted from images taken under
different illumination conditions (orientation, position and
intensity of the light sources). This is achieved by computing a
low-dimensional linear space of the spatio-illumination volume, and
is represented by a set of basis images. The model that has been
built can be used to create realistic renderings from different
viewpoints and illumination conditions. Applications include object
recognition, virtual reality and product advertisement.
Keywords: Geometric modeling, Representation, 3D reconstruction,
Shading (illumination), CAD/CAM, Virtual reality, Rendering, Object recognition.
Back to main publications
or
Click here to get the full PS version
Use of uncalibrated images has found many applications such as image
synthesis. However, it is not easy to specify the desired position
of the new image in projective or affine space. This paper proposes
to recover Euclidean structure from uncalibrated images using
domain knowledge such as distances and angles. The knowledge we have
is usually about an object category, but not very precise for the
particular object being considered. The variation (fuzziness) is
modeled as a Gaussian variable. Six types of common knowledge are
formulated. Once we have a Euclidean
description, the task to specify the desired position in Euclidean
space becomes trivial. The proposed technique is then applied to
synthesis of new facial images. A number of difficulties existing in
image synthesis are identified and solved. For example, we propose
to use edge points to deal with occlusion.
Keywords: 3D reconstruction, uncalibrated images, image
synthesis, representation, fuzzy domain knowledge.
Back to main publications
or
Click here to get the full PS version
The three best known criteria in two-view motion analysis are based,
respectively, on the distances between points and their corresponding
epipolar lines, on the gradient-weighted epipolar errors, and on the
distances between points and the reprojections of their reconstructed
points. The last one has a better statistical interpretation, but is,
however, much slower than the first two. In this paper, we show
that the last two criteria are equivalent when the epipoles are at infinity,
and differ from each other only a little even when the epipoles are in the
image. The first two criteria are equivalent only when the epipoles are at
infinity and when the observed object has the same scale in the two
images. This suggests that the second criterion is sufficient in practice
because of its computational efficiency. The result is valid for both
calibrated and uncalibrated images.
Keywords: Motion analysis, multiple-view geometry, 3D
reconstruction, optimization criteria, comparison.
Back to main publications
or
Click here to get the full PS version
The standard 2-stage algorithm first estimates the 9 essential
parameters defined up to a scale factor and then refines the motion
estimation based on some statistically optimal criteria. We propose
in this paper a novel approach by introducing an intermediate stage
which consists in estimating a $3\times 3$ matrix defined up to a
scale factor by imposing the \emph{rank-2 constraint} (the matrix
has seven independent parameters). The idea is to \emph{gradually}
project parameters estimated in a high dimensional space onto a
\emph{slightly lower}-dimensional space, namely from 8 dimensions to 7 and
finally to 5. Experiments with synthetic and real data show a
considerable improvement over the 2-stage algorithm. Our conjecture
from this work is that the imposition of the constraints arising
from projective geometry should be used as an intermediate step in
order to obtain reliable 3D Euclidean motion and structure
estimation from multiple calibrated images.\newline
Keywords: Motion and stereo, 3D reconstruction,
Structure from motion, Multiple-view geometry, Gradual constraint enforcing
Back to main publications
or
Click here to get the full PS version
There are emerging interests from both computer vision and computer
graphics communities in obtaining photorealistic modeling of a scene
or an object from real images.
This paper presents a tentative review of the computer vision
techniques used in such modeling which guarantee the generated views
to be geometrically correct.
The topics covered include mosaicking for building environment
maps, CAD-like modeling for building 3D geometric models together
with texture maps extracted from real images, image-based rendering for
synthesizing new views from uncalibrated images, and techniques for
modeling the appearance variation of a scene or an object under
different illumination conditions. Major issues and difficulties are
addressed.
Keywords: Photorealistic modeling, image-based rendering,
multiple-view geometry, photometric models, CAD, camera calibration, 3D
reconstruction, uncalibrated images, domain knowledge, illumination variation.
Back to main publications
or
Click here to get the full PS version
In this paper, we investigate the use of two types of features extracted
from face images for recognizing facial expressions. The first type is the
geometric positions of a set of fiducial points on a face. The second type
is a set of multi-scale and multi-orientation Gabor wavelet coefficients
extracted from the face image at the fiducial points. They can be used
either independently or jointly. The architecture we developed is based on a
two-layer perceptron. The recognition performance with different types of
features has been compared, which shows that Gabor wavelet coefficients are
much more powerful than geometric positions. Furthermore, since the first
layer of the perceptron actually performs a nonlinear reduction of the
dimensionality of the feature space, we have also studied the desired number
of hidden units, i.e., the appropriate dimension to represent a facial
expression in order to achieve a good recognition rate. It turns out that
five to seven hidden units are probably enough to represent the space of
feature expressions.
Keywords: Facial expression recognition, learning, Gabor wavelets,
multilayer perceptron.
Back to main publications
or
Click here to get the full PS version