Motion estimation is a very important problem in dynamic scene analysis. Although it is easier to estimate motion parameters from 3D data than from 2D images, it is not trivial since the 3D data we have are almost always corrupted by noise. This article presents a comparative study on motion estimation from 3D line segments. Two representations of line segments and two representations of rotation are described. With different representations of line segments and rotation, a number of methods for motion estimation are presented, including the Extended Kalman Filter, a general Minimization process and the Singular Value Decomposition. These methods are compared using both synthetic and real data obtained by a trinocular stereo. We observe that the Extended Kalman Filter with the rotation axis representation of rotation is preferable. We note that all methods discussed in this article can be directly applied to 3D point data.

**Keywords:** Motion Estimation, Motion from
Stereo, Noisy System, Nonlinear System, Minimization.

Back to main publications or Email me for a paper reprint

In this paper, we describe a method for computing the movement of objects as well as that of a mobile robot from a sequence of stereo frames. Stereo frames are obtained at different instants by a stereo rig, when the mobile robot navigates in an unknown environment possibly containing some moving rigid objects. An approach based on rigidity constraints is presented for registering two stereo frames. We demonstrate how the uncertainty of measurements can be integrated with the formalism of the rigidity constraints. A new technique is described to match very noisy segments. The influence of egomotion on observed movements of objects is discussed in detail. Egomotion is first determined and then eliminated before determination of the motion of objects. The proposed algorithm is completely automatic. Experimental results are provided. Some remarks conclude this paper.

** Keywords:** Motion from Stereo, Egomotion, Multiple Object Motions, Mobile Robot,
3D Matching, Rigidity Constraints, Uncertainty of Measurements.

Back to main publications or Email me for a hardcopy

We present a method for estimating 3D displacements from two stereo frames. It is based upon the hypothesize-and-verify paradigm which is used to match 3D line segments between the two frames. In order to reduce the complexity of the method, we make the assumption that objects are rigid. We formulate a set of complete rigidity constraints for 3D line segments and integrate the uncertainty of measurements in this formulation. The hypothesize and verify stages of the method use an Extended Kalman Filter to produce estimates of the displacements and of their uncertainty. In the experimental sections, the algorithm is shown to work on indoor and natural scenes. Furthermore it is easily extended, as also shown, to the case where several mobile objects are present. The method is quite robust, fast, and has been thoroughly tested on hundreds of real stereo frames.

**Keywords:** Motion from stereo, 3D matching,
rigidity constraints, uncertainty, hypothesize-and-verify, multiple
object motions, extended Kalman filtering, robot vision.

Back to main publications or Email me for a paper reprint

We address the problem of computing the three-dimensional motions of objects in a long sequence of stereo frames. Our approach is bottom-up and consists of two levels. The first level deals with the tracking of 3D tokens from frame to frame and the estimation of their kinematics. The processing is completely parallel for each token. The second level groups tokens into objects based on their kinematic parameters, controls the processing at the low level to cope with problems such as occlusion, disappearances and appearances of tokens, and provides information to other components of the system. We have implemented this approach using 3D line segments obtained from stereo as the tokens. We use classical kinematics and derive closed-form solutions for some special, but useful, cases of motions. The motion computation problem is then formulated as a tracking problem in order to apply the extended Kalman filter. The tracking is performed by feedforward computation in a prediction-matching-update loop in which multiple matches can be handled. The individual line segments can be grouped into objects according to the similarity of their kinematic parameters. Experiments using synthetic and real data have been carried out and good results can be observed.

**Keywords:** Image Sequence Analysis, 3D Motion Tracking and
Computation, Kinematic Model, 3D Token tracker, Multiple Object
Motions, Grouping, 3D Vision.

Back to main publications or Email me for a paper reprint

This article describes a system to incrementally build a world model with a mobile robot in an unknown environment. The model is, for the moment, segment-based. A trinocular stereo system is used to build a local map about the environment. A global map is obtained by integrating a sequence of stereo frames taken when the robot navigates in the environment. The emphasis of this article is on the representation of the uncertainty of 3D segments from stereo and on the integration of segments from multiple views. The proposed representation is simple and very convenient to characterize the uncertainty of segments. A Kalman filter is used to merge line segments matched. An important characteristic of our integration strategy is that a segment observed by the stereo system corresponds only to one part of the segment in space, so the union of the different observations gives a better estimate on the segment in space. We have succeeded in integrating 35 stereo frames taken in our robot room.

**Keywords:** Uncertainty Representation, Multiple View Integration,
World Model Builder, 3D Vision, Mobile Robot.

Back to main publications or Email me for a hardcopy

This article describes the work at INRIA on obstacle avoidance and trajectory planning for a mobile robot using stereovision. Our mobile robot is equiped with a trinocular vision system which is being put into hardware and will be capable of delivering 3D maps of the environment at rates between 1 and 5 Hz. Those 3D maps contain line segments extracted from the images and reconstructed in three dimensions. They are used for a variety of tasks including obstacle avoidance and trajectory planning.

For those two tasks, we project on the ground floor the 3D line segments to obtain a two-dimensional map, we simplify the map according to some simple geometric criteria, and use the remaining 2D segments to construct a tesselation, more precisely a triangulation, of the ground floor. This tesselation has several advantages:

- It is adapted to the structure of the environment since all stereo segments are edges of triangles in the tesselation,
- It can be efficiently computed (the algorithm we use has
a complexity
*O(n)*if*n*is the number of segments used), - It is dynamic, in the sense that segments can be added or subtracted from an existing triangulation efficiently,
- It can be computed in parallel quite easily.

We show a variety of real examples in which our robot navigates freely in real indoors environments using this system.

Back to main publications or Click here to get the full version

A heuristic method has been developed for registering two sets of 3-D curves obtained by using an edge-based stereo system, or two dense 3-D maps obtained by using a correlation-based stereo system. Geometric matching in general is a difficult unsolved problem in computer vision. Fortunately, in many practical applications, some a priori knowledge exists which considerably simplifies the problem. In visual navigation, for example, the motion between successive positions is usually approximately known. From this initial estimate, our algorithm computes observer motion with very good precision, which is required for environment modeling (e.g., building a Digital Elevation Map). Objects are represented by a set of 3-D points, which are considered as the samples of a surface. No constraint is imposed on the form of the objects. The proposed algorithm is based on iteratively matching points in one set to the closest points in the other. A statistical method based on the distance distribution is used to deal with outliers, occlusion, appearance and disappearance, which allows us to do subset-subset matching. A least-squares technique is used to estimate 3-D motion from the point correspondences, which reduces the average distance between points in the two sets. Both synthetic and real data have been used to test the algorithm, and the results show that it is efficient and robust, and yields an accurate motion estimate.

**Keywords:** Free-Form Curve and Surface Matching, 3-D
Registration, Motion Estimation, Dynamic Scene Analysis, 3-D Vision.

Back to main publications or Email me for a paper reprint

The statistical data association technique is an important approach to analyze long sequences of images in Computer Vision. Although it has extensively been studied in other domains such as in radar imagery, it was introduced only recently in Computer Vision, and is already recognized as an efficient approach to solving correspondence and motion problems. This paper has two purposes. The first is to present a general formulation of token tracking. The parameterization of tokens is not addressed. This might be useful to those who are not familiar with statistical tracking techniques. The second is to introduce some strategies for tracking with emphasis on practical importance. They include beam search for resolving multiple matches, support of existence for discarding false matches, and locking on reliable tokens and maximizing local rigidity for handling combinatorial explosion. We have implemented those strategies in a 3D line segment tracking algorithm and found them very useful.

**Keywords:** Token Tracking, Matching, Cluttered Scenes, Search
Strategies

Back to main publications or Email me for a paper reprint

We address in this paper how to find clusters based on proximity and planar facets based on coplanarity from 3D line segments obtained from stereo. The proposed methods are efficient and have been tested with many real stereo data. These procedures are indispensable in many applications including scene interpretation, object modeling and object recognition. We show their application to 3D motion determination. We have developed an algorithm based on the hypothesize-and-verify paradigm to register two consecutive 3D frames obtained from stereo and estimate their transformation/motion. By grouping 3D line segments in each frame into clusters and planes, we can reduce effectively the complexity of the hypothesis generation phase.

**Keywords:** Grouping, Line segments, Planes, Clusters, Uncertainty, Motion estimation.

Back to main publications or Email me for a paper reprint

Une méthode a été développée pour recaler deux nuages de points 3D obtenus en utilisant la stéréo par corrélation. Le recalage de deux ensembles de primitives géométriques est un problème en général très difficile et non résolu. Heureusement, dans beaucoup d'applications, des connaissances a priori simplifient considérablement le problème. Par exemple, le mouvement entre deux positions successives est généralement soit petit soit approximativement connu. A partir de cette estimée grossière, notre algorithme permet de calculer le mouvement avec une très bonne précision, nécessaire à l'obtention d'un modèle satisfaisant de l'environnement. Les objets observés sont représentés au moyen de nuages de points 3D. Ces points sont considérés comme des échantillons d'une surface. Aucune contrainte n'est a priori imposée sur la forme des objets. L'algorithme proposé est basé sur une mise en correspondance itérative des points d'une vue avec leurs plus proches voisins dans l'autre vue. Une méthode statistique basée sur la distribution de distances est utilisée pour éliminer les appariements aberrants. Une technique de moindres carrés est utilisée pour estimer le mouvement 3D à partir des correspondances de points. L'application de ce mouvement réduit la distance moyenne entre les surfaces dans les deux ensembles. Des données réelles ont été utilisées pour tester cet algorithme. Les résultats montrent qu'il est efficace et robuste, et qu'il donne une estimation précise du mouvement.

Back to main publications or Email me for a paper reprint

The problem of matching is one of the bottlenecks in computer vision. We identify three categories of matching: stereovision, object recognition, and image sequence analysis. This report tries to provide a complet survey of the works reported in the literature, with emphasis on matching between two (2D or 3D) images in a sequence.

Le problème de la mise en correspondance est l'un des problèmes les plus difficiles en vision par ordinateur. Nous identifions trois catégories de mise en correspondance : stéréovision, reconnaissance d'objets, et analyse de séquences d'images. Ce rapport vise à faire une revue complète sur l'ensemble de travaux dans la littérature avec une attention particulière sur la mise en correspondance entre deux images au sein d'une séquence, bidimensionnelles ou tridimensionnelles.

Back to main publications or Click here to get the full version

We describe an analytical method for recovering 3-D motion and structure of four or more points from one motion of a stereo rig. The extrinsic parameters are unknown. The motion of the stereo rig is also unknown. Because of the exploitation of information redundancy, the approach gains over the traditional ``{\em motion and structure from motion\/}" approach in that less features and less motions are required, and thus more robust estimation of motion and structure can be obtained. Since the constraint on the rotation matrix is not fully exploited in the analytical method, nonlinear minimization can be used to improve the result. Both computer simulated data and real data are used to validate the proposed algorithm, and very promising results are obtained.

**Keywords:** Stereovision, Structure from motion, Reconstruction, Calibration.

Back to main publications or Click here to get the full version

We address in this paper the problem of self-calibration and metric reconstruction (up to a scale) from one unknown motion of an uncalibrated stereo rig, assuming the coordinates of the principal point of each camera are known (This assumption is not necessary if one more motion is available). The epipolar constraint is first formulated for two uncalibrated images. The problem then becomes one of estimating unknowns such that the discrepancy from the epipolar constraint, in terms of distances between points and their corresponding epipolar lines, is minimized. The initialization of the unknowns is based on the work of Maybank, Luong and Faugeras on self-calibration of a single moving camera, which requires to solve a set of so-called Kruppa equations. Redundancy of the information contained in a sequence of stereo images makes this method more robust than using a sequence of monocular images. Real data have been used to test the proposed method, and the results obtained are quite good.

**Keywords:** Camera Calibration, Stereovision, Reconstruction, Self-calibration.

Back to main publications or Click here to get the full version

This paper proposes a robust approach to image matching by exploiting the only available geometric constraint, namely, the epipolar constraint. The images are uncalibrated, namely the motion between them and the camera parameters are not known. Thus, the images can be taken by different cameras or a single camera at different time instants. If we make an exhaustive search for the epipolar geometry, the complexity is prohibitively high. The idea underlying our approach is to use classical techniques (correlation and relaxation methods in our particular implementation) to find an initial set of matches, and then use a robust technique--the Least Median of Squares (LMedS)---to discard false matches in this set. The epipolar geometry can then be accurately estimated using a meaningful image criterion. More matches are eventually found, as in stereo matching, by using the recovered epipolar geometry. A large number of experiments have been carried out, and very good results have been obtained. Regarding the relaxation technique, we define a new measure of matching support, which allows a higher tolerance to deformation with respect to rigid transformations in the image plane and a smaller contribution for distant matches than for nearby ones. A new strategy for updating matches is developed, which only selects those matches having both high matching support and low matching ambiguity. The update strategy is different from the classical ``winner-take-all'', which is easily stuck at a local minimum, and also from ``looser-take-nothing'', which is usually very slow. The proposed algorithm has been widely tested and works remarkably well in a scene with many repetitive patterns.

**Keywords:**Robust Matching, Epipolar Geometry, Fundamental Matrix, Least
Median Squares (LMedS), Relaxation, Correlation.

Back to main publications or Click here to get the full version

We present in this paper an algorithm for determining 3D motion and structure from correspondences of \emph{line segments} between two perspective images. To our knowledge, this paper is the first investigation of use of line segments in motion and structure from motion. Classical methods use their geometric abstraction, namely straight lines, but then three images are necessary for the motion and structure determination process. In this paper we show that two views are in general sufficient when we use line segments. The assumption we use is that two matched line segments contain the projection of a \emph{common part} of the corresponding line segment in space. Indeed, this is what we use to match line segments between different views. Both synthetic and real data have been used to test the proposed algorithm, and excellent results have been obtained with real data containing a relatively large set of line segments. The results are comparable with those obtained using calibrated stereo.

**Keywords:** Motion, Structure from Motion, Line Segments,
Epipolar Geometry, Overlap, Dynamic Scene Analysis.

Back to main publications or Click here to get the full version

Also in

We present a novel technique for calibrating a binocular stereo rig by using the information from both scenes and classical calibration objects. The calibration provided by the calssical methods is only valid for the space near the position of the calibration object. Our technique takes the advantage of the rigidity of the geometry between two cameras. The idea is to first estimate precisely the epipolar geometry which is valid for a wide range in space from all available matches. This allows to conduct a projective reconstruction. Using the a priori knowledge of the calibration object, we are eventually able to calibrate the stereo rig in a Euclidean space. The proposed technique has been tested with a number of real images, and significant improvement has been observed.

**Keywords:** Camera Calibration, Stereovision, Epipolar Geometry,
Projective Reconstruction.

Back to main publications
or Click here to get the full version

Here is the demonstration with experimental data

Almost all problems in computer vision are related in one form or another to the problem of estimating parameters from noisy data. In this tutorial, we present what is probably the most commonly used techniques for parameter estimation. These include linear least-squares (pseudo-inverse and eigen analysis); orthogonal least-squares; gradient-weighted least-squares; bias-corrected renormalization; Kalman filtering; and robust techniques (clustering, regression diagnostics, M-estimators, least median of squares). Particular attention has been devoted to discussions about the choice of appropriate minimization criteria and the robustness of the different techniques. Their application to conic fitting is described.

**Keywords:** Parameter estimation, Least-squares, Bias correction, Kalman
filtering, Robust regression.

Back to main publications or Click here to get the full PS version or Click here to read the HTML version

This paper describes a complete stereovision system, which was originally developed for planetary applications, but can be used for other applications such as object modeling. A new effective on-site calibration technique has been developed, which can make use of the information from the surrounding environment as well as the information from the calibration apparatus. A correlation-based stereo algorithm is used, which can produce sufficient dense range maps with an algorithmic structure for fast implementations. A technique based on iterative closest-point matching has been developed for registration of successive depth maps and computation of the displacements between successive positions. A statistical method based on the distance distribution is integrated into this registration technique, which allows us to deal with the important problems such as outliers, occlusion, appearance and disappearance. Finally, the registered maps are expressed in the same coordinate system and are fused, erroneous data are eliminated through consistency checking, and a global digital elevation map is built incrementally.

**Keywords:** Motion Analysis, Structure from Motion, Gradual Constraint
Enforcing, Multistage Algorithm

Back to main publications or Click here to get the full PS version

The classical approach to motion and structure estimation problem from two perspective projections consists of two stages: (i) using the 8-point algorithm to estimate the 9 essential parameters defined up to a scale factor, which is a linear estimation problem; (ii) refining the motion estimation based on some statistically optimal criteria, which is a nonlinear estimation problem on a five-dimensional space. Unfortunately, the results obtained using this approach are often not satisfactory, especially when the motion is small or when the observed points are close to a degenerate surface (e.g. plane). The problem is that the second stage is very sensitive to the initial guess, and that it is very difficult to obtain a precise initial estimate from the first stage. This is because we perform a projection of a set of quantities which are estimated in a space of 8 dimensions, much higher than that of the real space which is five-dimensional. We propose in this paper a novel approach by introducing an intermediate stage which consists in estimating a $3\times 3$ matrix defined up to a scale factor by imposing the \emph{zero-determinant constraint} (the matrix has seven independent parameters, and is known as the fundamental matrix). The idea is to \emph{gradually} project parameters estimated in a high dimensional space onto a \emph{slightly lower} space, namely from 8 dimensions to 7 and finally to 5. The proposed approach has been tested with synthetic and real data, and a considerable improvement has been observed for the delicate situations mentioned above. Our conjecture from this work is that the imposition of the constraints arising from projective geometry should be used as an intermediate step in order to obtain reliable 3D Euclidean motion and structure estimation from multiple calibrated images.

**Keywords:** Motion Analysis, Structure from Motion, Gradual Constraint
Enforcing, Multistage Algorithm.

Back to main publications
or
Click here to get the full PS version

Click here to get a copy of the software
** SFM** compiled for SUN.

Here is the demonstration

Updated version of

**Keywords:** Epipolar geometry, Fundamental matrix, Uncertainty,
Covariance matrix, Epipolar band.

Back to main publications or Click here to get the full PS version

**Keywords:** Epipolar Geometry, Fundamental Matrix, Calibration,
Reconstruction, Parameter Estimation, Robust Techniques, Uncertainty
Characterization, Performance Evaluation, Software

Back to main publications
or
Click here to get the full PS version

Click here to get a copy of the softwares
** FMatrix & Fdiff** compiled for Solaris and Linux.

In order to achieve a 3D, either Euclidean or projective, reconstruction with
high precision, one has to consider lens distortion. In almost all work on
multiple-views problems in computer vision, a camera is modeled as a
pinhole. Lens distortion has usually been corrected off-line. This paper
intends to consider lens distortion as an integral part of a camera.
We first describe the epipolar geometry between two images with
lens distortion. For a point in one image, its corresponding point in the
other image should lie on a so-called *epipolar curve*. We then
investigate the possibility
of estimating the distortion parameters and the fundamental matrix based on
the generalized epipolar constraint. Experimental results with computer
simulation show that the distortion parameters can be estimated correctly if
the noise in image points is low and the lens distortion is severe. Otherwise,
it is better to treat the cameras as being distortion-free.

**Keywords:** Camera Calibration, Lens Distortion, Epipolar Geometry,
Fundamental Matrix, Epipolar Curve

Back to main publications or Click here to get the full PS version

The success of an intelligent robotic system depends on the performance of its vision-system which in turn depends to a great extend upon the quality of its calibration. During the execution of a task the vision-system is subject to external influences such as vibrations, thermal expansion etc. which affect and possibly render invalid the initial calibration. Moreover, it is possible that the parameters of the vision-system like e.g. the zoom or the focus are altered intentionally in order to perform specific vision-tasks. This paper describes a technique for automatically maintaining calibration of stereovision systems over time without using again any particular calibration apparatus. It uses all available information, i.e. both spatial and temporal data. Uncertainty is systematically manipulated and maintained. Synthetical and real data are used to validate the proposed technique, and the results compare very favourably with those given by classical calibration methods.

**Keywords:** Camera calibration, Calibration maintaining,
Dynamic vision, Pose determination, 3D vision.

Back to main publications or Click here to get the full PS version

This paper addresses the recovery of structure and motion from two uncalibrated images of a scene under full perspective or under affine projection. Epipolar geometry, projective reconstruction, and affine reconstruction are elaborated in a way such that everyone having knowledge of linear algebra can understand the discussion without difficulty. A general expression of the fundamental matrix is derived which is valid for any projection model without lens distortion (including full perspective and affine camera). A new technique for affine reconstruction from two affine images is developed, which consists in first estimating the affine epipolar geometry and then performing a triangulation for each point match with respect to an implicit common affine basis. This technique is very efficient.

**Keywords:** Motion Analysis, Epipolar Geometry, Uncalibrated Images,
Non-Metric Vision, 3D Reconstruction, Fundamental Matrix.

Back to main publications or Click here to get the full PS version

We present in this paper a system which automatically builds, from real images, a scene model containing both 3D geometric information of the scene structure and its photometric information under various illumination conditions. The geometric structure is recovered from images taken from distinct viewpoints. Structure-from-motion and correlation-based stereo techniques are used to match pixels between images of different viewpoints and to reconstruct the scene in 3D space. The photometric property is extracted from images taken under different illumination conditions (orientation, position and intensity of the light sources). This is achieved by computing a low-dimensional linear space of the spatio-illumination volume, and is represented by a set of basis images. The model that has been built can be used to create realistic renderings from different viewpoints and illumination conditions. Applications include object recognition, virtual reality and product advertisement.

**Keywords:** Geometric modeling, Representation, 3D reconstruction,
Shading (illumination), CAD/CAM, Virtual reality, Rendering, Object recognition.

Back to main publications or Click here to get the full PS version

Use of uncalibrated images has found many applications such as image synthesis. However, it is not easy to specify the desired position of the new image in projective or affine space. This paper proposes to recover Euclidean structure from uncalibrated images using domain knowledge such as distances and angles. The knowledge we have is usually about an object category, but not very precise for the particular object being considered. The variation (fuzziness) is modeled as a Gaussian variable. Six types of common knowledge are formulated. Once we have a Euclidean description, the task to specify the desired position in Euclidean space becomes trivial. The proposed technique is then applied to synthesis of new facial images. A number of difficulties existing in image synthesis are identified and solved. For example, we propose to use edge points to deal with occlusion.

**Keywords:** 3D reconstruction, uncalibrated images, image
synthesis, representation, fuzzy domain knowledge.

Back to main publications or Click here to get the full PS version

The three best known criteria in two-view motion analysis are based, respectively, on the distances between points and their corresponding epipolar lines, on the gradient-weighted epipolar errors, and on the distances between points and the reprojections of their reconstructed points. The last one has a better statistical interpretation, but is, however, much slower than the first two. In this paper, we show that the last two criteria are equivalent when the epipoles are at infinity, and differ from each other only a little even when the epipoles are in the image. The first two criteria are equivalent only when the epipoles are at infinity and when the observed object has the same scale in the two images. This suggests that the second criterion is sufficient in practice because of its computational efficiency. The result is valid for both calibrated and uncalibrated images.

**Keywords:** Motion analysis, multiple-view geometry, 3D
reconstruction, optimization criteria, comparison.

Back to main publications or Click here to get the full PS version

The standard 2-stage algorithm first estimates the 9 essential parameters defined up to a scale factor and then refines the motion estimation based on some statistically optimal criteria. We propose in this paper a novel approach by introducing an intermediate stage which consists in estimating a $3\times 3$ matrix defined up to a scale factor by imposing the \emph{rank-2 constraint} (the matrix has seven independent parameters). The idea is to \emph{gradually} project parameters estimated in a high dimensional space onto a \emph{slightly lower}-dimensional space, namely from 8 dimensions to 7 and finally to 5. Experiments with synthetic and real data show a considerable improvement over the 2-stage algorithm. Our conjecture from this work is that the imposition of the constraints arising from projective geometry should be used as an intermediate step in order to obtain reliable 3D Euclidean motion and structure estimation from multiple calibrated images.\newline

**Keywords:** Motion and stereo, 3D reconstruction,
Structure from motion, Multiple-view geometry, Gradual constraint enforcing

Back to main publications or Click here to get the full PS version

There are emerging interests from both computer vision and computer graphics communities in obtaining photorealistic modeling of a scene or an object from real images. This paper presents a tentative review of the computer vision techniques used in such modeling which guarantee the generated views to be geometrically correct. The topics covered include mosaicking for building environment maps, CAD-like modeling for building 3D geometric models together with texture maps extracted from real images, image-based rendering for synthesizing new views from uncalibrated images, and techniques for modeling the appearance variation of a scene or an object under different illumination conditions. Major issues and difficulties are addressed.

**Keywords:** Photorealistic modeling, image-based rendering,
multiple-view geometry, photometric models, CAD, camera calibration, 3D
reconstruction, uncalibrated images, domain knowledge, illumination variation.

Back to main publications or Click here to get the full PS version

In this paper, we investigate the use of two types of features extracted from face images for recognizing facial expressions. The first type is the geometric positions of a set of fiducial points on a face. The second type is a set of multi-scale and multi-orientation Gabor wavelet coefficients extracted from the face image at the fiducial points. They can be used either independently or jointly. The architecture we developed is based on a two-layer perceptron. The recognition performance with different types of features has been compared, which shows that Gabor wavelet coefficients are much more powerful than geometric positions. Furthermore, since the first layer of the perceptron actually performs a nonlinear reduction of the dimensionality of the feature space, we have also studied the desired number of hidden units, i.e., the appropriate dimension to represent a facial expression in order to achieve a good recognition rate. It turns out that five to seven hidden units are probably enough to represent the space of feature expressions.

**Keywords:** Facial expression recognition, learning, Gabor wavelets,
multilayer perceptron.

Back to main publications or Click here to get the full PS version

Go to Zhengyou Zhang home page