Zhengyou Zhang: Abstracts of Main Publications

## Abstracts of the papers

### Determining Motion from 3D Line Segments: A Comparative Study

Z. Zhang and O.D. Faugeras. International Journal of Image and Vision Computing, Vol. 9, No. 1, pages 10-19. February 1991.

Motion estimation is a very important problem in dynamic scene analysis. Although it is easier to estimate motion parameters from 3D data than from 2D images, it is not trivial since the 3D data we have are almost always corrupted by noise. This article presents a comparative study on motion estimation from 3D line segments. Two representations of line segments and two representations of rotation are described. With different representations of line segments and rotation, a number of methods for motion estimation are presented, including the Extended Kalman Filter, a general Minimization process and the Singular Value Decomposition. These methods are compared using both synthetic and real data obtained by a trinocular stereo. We observe that the Extended Kalman Filter with the rotation axis representation of rotation is preferable. We note that all methods discussed in this article can be directly applied to 3D point data.

Keywords: Motion Estimation, Motion from Stereo, Noisy System, Nonlinear System, Minimization.

### Analysis of a Sequence of Stereo Scenes Containing Multiple Moving Objects Using Rigidity Constraints

Z. Zhang, O.D. Faugeras and N. Ayache. in R. Kasturi and R.C. Jain (eds), Computer Vision: Principles, IEEE computer society press, 1991.

In this paper, we describe a method for computing the movement of objects as well as that of a mobile robot from a sequence of stereo frames. Stereo frames are obtained at different instants by a stereo rig, when the mobile robot navigates in an unknown environment possibly containing some moving rigid objects. An approach based on rigidity constraints is presented for registering two stereo frames. We demonstrate how the uncertainty of measurements can be integrated with the formalism of the rigidity constraints. A new technique is described to match very noisy segments. The influence of egomotion on observed movements of objects is discussed in detail. Egomotion is first determined and then eliminated before determination of the motion of objects. The proposed algorithm is completely automatic. Experimental results are provided. Some remarks conclude this paper.

Keywords: Motion from Stereo, Egomotion, Multiple Object Motions, Mobile Robot, 3D Matching, Rigidity Constraints, Uncertainty of Measurements.

### Estimation of Displacements from Two 3D Frames Obtained from Stereo

Z. Zhang and O.D. Faugeras. IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 14, No. 12, pages 1141-1156, 1992.

We present a method for estimating 3D displacements from two stereo frames. It is based upon the hypothesize-and-verify paradigm which is used to match 3D line segments between the two frames. In order to reduce the complexity of the method, we make the assumption that objects are rigid. We formulate a set of complete rigidity constraints for 3D line segments and integrate the uncertainty of measurements in this formulation. The hypothesize and verify stages of the method use an Extended Kalman Filter to produce estimates of the displacements and of their uncertainty. In the experimental sections, the algorithm is shown to work on indoor and natural scenes. Furthermore it is easily extended, as also shown, to the case where several mobile objects are present. The method is quite robust, fast, and has been thoroughly tested on hundreds of real stereo frames.

Keywords: Motion from stereo, 3D matching, rigidity constraints, uncertainty, hypothesize-and-verify, multiple object motions, extended Kalman filtering, robot vision.

### Three-Dimensional Motion Computation and Object Segmentation in a Long Sequence of Stereo Frames

Z. Zhang and O.D. Faugeras. International Journal of Computer Vision, Vol. 7, No. 3, pages 211-241, March 1992.

We address the problem of computing the three-dimensional motions of objects in a long sequence of stereo frames. Our approach is bottom-up and consists of two levels. The first level deals with the tracking of 3D tokens from frame to frame and the estimation of their kinematics. The processing is completely parallel for each token. The second level groups tokens into objects based on their kinematic parameters, controls the processing at the low level to cope with problems such as occlusion, disappearances and appearances of tokens, and provides information to other components of the system. We have implemented this approach using 3D line segments obtained from stereo as the tokens. We use classical kinematics and derive closed-form solutions for some special, but useful, cases of motions. The motion computation problem is then formulated as a tracking problem in order to apply the extended Kalman filter. The tracking is performed by feedforward computation in a prediction-matching-update loop in which multiple matches can be handled. The individual line segments can be grouped into objects according to the similarity of their kinematic parameters. Experiments using synthetic and real data have been carried out and good results can be observed.

Keywords: Image Sequence Analysis, 3D Motion Tracking and Computation, Kinematic Model, 3D Token tracker, Multiple Object Motions, Grouping, 3D Vision.

### A 3D World Model Builder with a Mobile Robot

Z. Zhang and O.D. Faugeras. International Journal of Robotics Research, Vol. 11, No. 4, pages 269-285, 1992.

This article describes a system to incrementally build a world model with a mobile robot in an unknown environment. The model is, for the moment, segment-based. A trinocular stereo system is used to build a local map about the environment. A global map is obtained by integrating a sequence of stereo frames taken when the robot navigates in the environment. The emphasis of this article is on the representation of the uncertainty of 3D segments from stereo and on the integration of segments from multiple views. The proposed representation is simple and very convenient to characterize the uncertainty of segments. A Kalman filter is used to merge line segments matched. An important characteristic of our integration strategy is that a segment observed by the stereo system corresponds only to one part of the segment in space, so the union of the different observations gives a better estimate on the segment in space. We have succeeded in integrating 35 stereo frames taken in our robot room.

Keywords: Uncertainty Representation, Multiple View Integration, World Model Builder, 3D Vision, Mobile Robot.

### Obstacle Avoidance and Trajectory Planning for an Indoor Mobile Robot Using Stereo Vision and Delaunay Triangulation

M. Buffa, O.D. Faugeras, and Z. Zhang. In Vision-based Vehicle Guidance, Springer, New York, 1992, edited by I. Masaki, Chap. 13, pages 268-283.

This article describes the work at INRIA on obstacle avoidance and trajectory planning for a mobile robot using stereovision. Our mobile robot is equiped with a trinocular vision system which is being put into hardware and will be capable of delivering 3D maps of the environment at rates between 1 and 5 Hz. Those 3D maps contain line segments extracted from the images and reconstructed in three dimensions. They are used for a variety of tasks including obstacle avoidance and trajectory planning.

For those two tasks, we project on the ground floor the 3D line segments to obtain a two-dimensional map, we simplify the map according to some simple geometric criteria, and use the remaining 2D segments to construct a tesselation, more precisely a triangulation, of the ground floor. This tesselation has several advantages:

• It is adapted to the structure of the environment since all stereo segments are edges of triangles in the tesselation,
• It can be efficiently computed (the algorithm we use has a complexity O(n) if n is the number of segments used),
• It is dynamic, in the sense that segments can be added or subtracted from an existing triangulation efficiently,
• It can be computed in parallel quite easily.
We use this triangulation as a support for further processing. We first determine free space, simply by marking those triangles which are empty, again a very simple processing, and then use the graph formed by those triangles to generate collision free trajectories. When new sensory data is acquired the ground floor map is easily updated using the nice computational properties of the Delaunay triangulation and the process is iterated.

We show a variety of real examples in which our robot navigates freely in real indoors environments using this system.

### Iterative Point Matching for Registration of Free-Form Curves and Surfaces

Z. Zhang. International Journal of Computer Vision, Vol.13, No.2, pages 119-152, 1994.

A heuristic method has been developed for registering two sets of 3-D curves obtained by using an edge-based stereo system, or two dense 3-D maps obtained by using a correlation-based stereo system. Geometric matching in general is a difficult unsolved problem in computer vision. Fortunately, in many practical applications, some a priori knowledge exists which considerably simplifies the problem. In visual navigation, for example, the motion between successive positions is usually approximately known. From this initial estimate, our algorithm computes observer motion with very good precision, which is required for environment modeling (e.g., building a Digital Elevation Map). Objects are represented by a set of 3-D points, which are considered as the samples of a surface. No constraint is imposed on the form of the objects. The proposed algorithm is based on iteratively matching points in one set to the closest points in the other. A statistical method based on the distance distribution is used to deal with outliers, occlusion, appearance and disappearance, which allows us to do subset-subset matching. A least-squares technique is used to estimate 3-D motion from the point correspondences, which reduces the average distance between points in the two sets. Both synthetic and real data have been used to test the algorithm, and the results show that it is efficient and robust, and yields an accurate motion estimate.

Keywords: Free-Form Curve and Surface Matching, 3-D Registration, Motion Estimation, Dynamic Scene Analysis, 3-D Vision.

### Token Tracking in a Cluttered Scene

Z. Zhang. International Journal of Image and Vision Computing, Vol.12, No.2, pages 110-120, 1994.

The statistical data association technique is an important approach to analyze long sequences of images in Computer Vision. Although it has extensively been studied in other domains such as in radar imagery, it was introduced only recently in Computer Vision, and is already recognized as an efficient approach to solving correspondence and motion problems. This paper has two purposes. The first is to present a general formulation of token tracking. The parameterization of tokens is not addressed. This might be useful to those who are not familiar with statistical tracking techniques. The second is to introduce some strategies for tracking with emphasis on practical importance. They include beam search for resolving multiple matches, support of existence for discarding false matches, and locking on reliable tokens and maximizing local rigidity for handling combinatorial explosion. We have implemented those strategies in a 3D line segment tracking algorithm and found them very useful.

Keywords: Token Tracking, Matching, Cluttered Scenes, Search Strategies

### Finding Planes and Clusters of Objects from 3D Line Segments with Application to 3D Motion Determination

Z. Zhang and O.D. Faugeras. CVGIP: Image Understanding, Vol.60, No.3, pages 267-284, November 1994.

We address in this paper how to find clusters based on proximity and planar facets based on coplanarity from 3D line segments obtained from stereo. The proposed methods are efficient and have been tested with many real stereo data. These procedures are indispensable in many applications including scene interpretation, object modeling and object recognition. We show their application to 3D motion determination. We have developed an algorithm based on the hypothesize-and-verify paradigm to register two consecutive 3D frames obtained from stereo and estimate their transformation/motion. By grouping 3D line segments in each frame into clusters and planes, we can reduce effectively the complexity of the hypothesis generation phase.

Keywords: Grouping, Line segments, Planes, Clusters, Uncertainty, Motion estimation.

### Recalage de deux nuages de points 3D

Z. Zhang. Traitement du signal, Vol.10, No.4, pages 263-281, 1993.

Une méthode a été développée pour recaler deux nuages de points 3D obtenus en utilisant la stéréo par corrélation. Le recalage de deux ensembles de primitives géométriques est un problème en général très difficile et non résolu. Heureusement, dans beaucoup d'applications, des connaissances a priori simplifient considérablement le problème. Par exemple, le mouvement entre deux positions successives est généralement soit petit soit approximativement connu. A partir de cette estimée grossière, notre algorithme permet de calculer le mouvement avec une très bonne précision, nécessaire à l'obtention d'un modèle satisfaisant de l'environnement. Les objets observés sont représentés au moyen de nuages de points 3D. Ces points sont considérés comme des échantillons d'une surface. Aucune contrainte n'est a priori imposée sur la forme des objets. L'algorithme proposé est basé sur une mise en correspondance itérative des points d'une vue avec leurs plus proches voisins dans l'autre vue. Une méthode statistique basée sur la distribution de distances est utilisée pour éliminer les appariements aberrants. Une technique de moindres carrés est utilisée pour estimer le mouvement 3D à partir des correspondances de points. L'application de ce mouvement réduit la distance moyenne entre les surfaces dans les deux ensembles. Des données réelles ont été utilisées pour tester cet algorithme. Les résultats montrent qu'il est efficace et robuste, et qu'il donne une estimation précise du mouvement.

Keywords: Mise en correspondance de surfaces, Recalage 3D, Estimation du mouvement, Analyse de scène dynamique, Vision pour la robotique

### Le problème de la mise en correspondance: L'état de l'art

Z. Zhang. Research Report, No.2146, INRIA Sophia-Antipolis, Dec. 1993.

The problem of matching is one of the bottlenecks in computer vision. We identify three categories of matching: stereovision, object recognition, and image sequence analysis. This report tries to provide a complet survey of the works reported in the literature, with emphasis on matching between two (2D or 3D) images in a sequence.

Le problème de la mise en correspondance est l'un des problèmes les plus difficiles en vision par ordinateur. Nous identifions trois catégories de mise en correspondance : stéréovision, reconnaissance d'objets, et analyse de séquences d'images. Ce rapport vise à faire une revue complète sur l'ensemble de travaux dans la littérature avec une attention particulière sur la mise en correspondance entre deux images au sein d'une séquence, bidimensionnelles ou tridimensionnelles.

Keywords: Matching, Stereovision, Object Recognition, Image Sequence Analysis; Mise en correspondance, Stéréovision, Reconnaissance d'objets, Séquences d'images

### Motion and Structure of Four Points from One Motion of a Stereo Rig with Unknown Extrinsic Parameters

Z. Zhang. IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.17, No.12, pages 1222-1227, December 1995. Short Version in Proc. IEEE Conf. Computer Vision and Pattern Recognition, pages 556-561, New York, June 1993.

We describe an analytical method for recovering 3-D motion and structure of four or more points from one motion of a stereo rig. The extrinsic parameters are unknown. The motion of the stereo rig is also unknown. Because of the exploitation of information redundancy, the approach gains over the traditional {\em motion and structure from motion\/}" approach in that less features and less motions are required, and thus more robust estimation of motion and structure can be obtained. Since the constraint on the rotation matrix is not fully exploited in the analytical method, nonlinear minimization can be used to improve the result. Both computer simulated data and real data are used to validate the proposed algorithm, and very promising results are obtained.

Keywords: Stereovision, Structure from motion, Reconstruction, Calibration.

### Motion of an Uncalibrated Stereo Rig: Self-Calibration and Metric Reconstruction

Z. Zhang, Q.-T. Luong, and O. Faugeras. IEEE Trans. Robotics and Automation Vol.12, No.1, pages 103-113, February 1996. Short version in Proc. the 12th International Conference on Pattern Recognition pages 695-697, Jerusalem, Israel, October 1994. Also INRIA Research Report No.2079, 1993, revised June 1994.

We address in this paper the problem of self-calibration and metric reconstruction (up to a scale) from one unknown motion of an uncalibrated stereo rig, assuming the coordinates of the principal point of each camera are known (This assumption is not necessary if one more motion is available). The epipolar constraint is first formulated for two uncalibrated images. The problem then becomes one of estimating unknowns such that the discrepancy from the epipolar constraint, in terms of distances between points and their corresponding epipolar lines, is minimized. The initialization of the unknowns is based on the work of Maybank, Luong and Faugeras on self-calibration of a single moving camera, which requires to solve a set of so-called Kruppa equations. Redundancy of the information contained in a sequence of stereo images makes this method more robust than using a sequence of monocular images. Real data have been used to test the proposed method, and the results obtained are quite good.

Keywords: Camera Calibration, Stereovision, Reconstruction, Self-calibration.

### A Robust Technique for Matching Two Uncalibrated Images Through the Recovery of the Unknown Epipolar Geometry

Z. Zhang, R. Deriche, O. Faugeras, Q.-T. Luong. Artificial Intelligence Journal, Vol.78, pages 87-119, October 1995. Also Research Report No.2273, INRIA Sophia-Antipolis.

This paper proposes a robust approach to image matching by exploiting the only available geometric constraint, namely, the epipolar constraint. The images are uncalibrated, namely the motion between them and the camera parameters are not known. Thus, the images can be taken by different cameras or a single camera at different time instants. If we make an exhaustive search for the epipolar geometry, the complexity is prohibitively high. The idea underlying our approach is to use classical techniques (correlation and relaxation methods in our particular implementation) to find an initial set of matches, and then use a robust technique--the Least Median of Squares (LMedS)---to discard false matches in this set. The epipolar geometry can then be accurately estimated using a meaningful image criterion. More matches are eventually found, as in stereo matching, by using the recovered epipolar geometry. A large number of experiments have been carried out, and very good results have been obtained. Regarding the relaxation technique, we define a new measure of matching support, which allows a higher tolerance to deformation with respect to rigid transformations in the image plane and a smaller contribution for distant matches than for nearby ones. A new strategy for updating matches is developed, which only selects those matches having both high matching support and low matching ambiguity. The update strategy is different from the classical winner-take-all'', which is easily stuck at a local minimum, and also from looser-take-nothing'', which is usually very slow. The proposed algorithm has been widely tested and works remarkably well in a scene with many repetitive patterns.

Keywords:Robust Matching, Epipolar Geometry, Fundamental Matrix, Least Median Squares (LMedS), Relaxation, Correlation.

### Estimating Motion and Structure from Correspondences of Line Segments Between Two Perspective Images

Z. Zhang. IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.17, No.12, pages 1129-1139, December 1995. Short Version in Proc. 5th Int'l Conf. Computer Vision (ICCV), pages 257-262, Cambridge, Massachusetts, USA, June 1995. Also Research Report No.2340, INRIA Sophia-Antipolis.

We present in this paper an algorithm for determining 3D motion and structure from correspondences of \emph{line segments} between two perspective images. To our knowledge, this paper is the first investigation of use of line segments in motion and structure from motion. Classical methods use their geometric abstraction, namely straight lines, but then three images are necessary for the motion and structure determination process. In this paper we show that two views are in general sufficient when we use line segments. The assumption we use is that two matched line segments contain the projection of a \emph{common part} of the corresponding line segment in space. Indeed, this is what we use to match line segments between different views. Both synthetic and real data have been used to test the proposed algorithm, and excellent results have been obtained with real data containing a relatively large set of line segments. The results are comparable with those obtained using calibrated stereo.

Keywords: Motion, Structure from Motion, Line Segments, Epipolar Geometry, Overlap, Dynamic Scene Analysis.

### An Effective Technique for Calibrating a Binocular Stereo Through Projective Reconstruction Using Both a Calibration Object and the Environment

Z. Zhang, O. Faugeras, R. Deriche. Videre: A Journal of Computer Vision Research (MIT Press), Vol.1, No.1, pages 58-68, 1997. 210k PDF and DEMO and DATA
Also in Proceedings of Europe-China Workshop on Geometrical modelling and Invariants for Computer Vision, pages 253-260, April 1995, Xi'an, China.

We present a novel technique for calibrating a binocular stereo rig by using the information from both scenes and classical calibration objects. The calibration provided by the calssical methods is only valid for the space near the position of the calibration object. Our technique takes the advantage of the rigidity of the geometry between two cameras. The idea is to first estimate precisely the epipolar geometry which is valid for a wide range in space from all available matches. This allows to conduct a projective reconstruction. Using the a priori knowledge of the calibration object, we are eventually able to calibrate the stereo rig in a Euclidean space. The proposed technique has been tested with a number of real images, and significant improvement has been observed.

Keywords: Camera Calibration, Stereovision, Epipolar Geometry, Projective Reconstruction.

### Parameter Estimation Techniques: A Tutorial with Application to Conic Fitting

Z. Zhang. to appear in International Journal of Image and Vision Computing, 1996. Also Research Report No.2676, INRIA Sophia-Antipolis, October 1995.

Almost all problems in computer vision are related in one form or another to the problem of estimating parameters from noisy data. In this tutorial, we present what is probably the most commonly used techniques for parameter estimation. These include linear least-squares (pseudo-inverse and eigen analysis); orthogonal least-squares; gradient-weighted least-squares; bias-corrected renormalization; Kalman filtering; and robust techniques (clustering, regression diagnostics, M-estimators, least median of squares). Particular attention has been devoted to discussions about the choice of appropriate minimization criteria and the robustness of the different techniques. Their application to conic fitting is described.

Keywords: Parameter estimation, Least-squares, Bias correction, Kalman filtering, Robust regression.

### A Stereovision System for a Planetary Rover: Calibration, Correlation, Registration, and Fusion

Z. Zhang. In Proc. IEEE Workshop on Planetary Rover Technology and Systems, April 1996, Minneapolis, Minnesota, USA.

This paper describes a complete stereovision system, which was originally developed for planetary applications, but can be used for other applications such as object modeling. A new effective on-site calibration technique has been developed, which can make use of the information from the surrounding environment as well as the information from the calibration apparatus. A correlation-based stereo algorithm is used, which can produce sufficient dense range maps with an algorithmic structure for fast implementations. A technique based on iterative closest-point matching has been developed for registration of successive depth maps and computation of the displacements between successive positions. A statistical method based on the distance distribution is integrated into this registration technique, which allows us to deal with the important problems such as outliers, occlusion, appearance and disappearance. Finally, the registered maps are expressed in the same coordinate system and are fused, erroneous data are eliminated through consistency checking, and a global digital elevation map is built incrementally.

Keywords: Motion Analysis, Structure from Motion, Gradual Constraint Enforcing, Multistage Algorithm

### A New Multistage Approach to Motion and Structure Estimation: From Essential Parameters to Euclidean Motion Via Fundamental Matrix

Z. Zhang. Research Report, No.2910, INRIA Sophia-Antipolis, June 1996.

The classical approach to motion and structure estimation problem from two perspective projections consists of two stages: (i) using the 8-point algorithm to estimate the 9 essential parameters defined up to a scale factor, which is a linear estimation problem; (ii) refining the motion estimation based on some statistically optimal criteria, which is a nonlinear estimation problem on a five-dimensional space. Unfortunately, the results obtained using this approach are often not satisfactory, especially when the motion is small or when the observed points are close to a degenerate surface (e.g. plane). The problem is that the second stage is very sensitive to the initial guess, and that it is very difficult to obtain a precise initial estimate from the first stage. This is because we perform a projection of a set of quantities which are estimated in a space of 8 dimensions, much higher than that of the real space which is five-dimensional. We propose in this paper a novel approach by introducing an intermediate stage which consists in estimating a $3\times 3$ matrix defined up to a scale factor by imposing the \emph{zero-determinant constraint} (the matrix has seven independent parameters, and is known as the fundamental matrix). The idea is to \emph{gradually} project parameters estimated in a high dimensional space onto a \emph{slightly lower} space, namely from 8 dimensions to 7 and finally to 5. The proposed approach has been tested with synthetic and real data, and a considerable improvement has been observed for the delicate situations mentioned above. Our conjecture from this work is that the imposition of the constraints arising from projective geometry should be used as an intermediate step in order to obtain reliable 3D Euclidean motion and structure estimation from multiple calibrated images.

Keywords: Motion Analysis, Structure from Motion, Gradual Constraint Enforcing, Multistage Algorithm.

Back to main publications or Click here to get the full PS version
Click here to get a copy of the software SFM compiled for SUN.
Here is the demonstration

### Characterizing the Uncertainty of the Fundamental Matrix

G. Csurka, C. Zeller, Z. Zhang, and O. Faugeras. Computer Vision and Image Understanding, Vol.68, No.1, pages 18-36, October 1997.
Updated version of INRIA Research Report No.2560, 1995.

Keywords: Epipolar geometry, Fundamental matrix, Uncertainty, Covariance matrix, Epipolar band.

### Determining the Epipolar Geometry and its Uncertainty: A Review

Z. Zhang. Research Report, No.2927, INRIA Sophia-Antipolis, July 1996.

Keywords: Epipolar Geometry, Fundamental Matrix, Calibration, Reconstruction, Parameter Estimation, Robust Techniques, Uncertainty Characterization, Performance Evaluation, Software

Back to main publications or Click here to get the full PS version
Click here to get a copy of the softwares FMatrix & Fdiff compiled for Solaris and Linux.

### On the Epipolar Geometry Between Two Images With Lens Distortion

Z. Zhang. In Proceedings of International Conference on Pattern Recognition, Vol. I, pages 407-411, Vienna, Austria, August 1996.

In order to achieve a 3D, either Euclidean or projective, reconstruction with high precision, one has to consider lens distortion. In almost all work on multiple-views problems in computer vision, a camera is modeled as a pinhole. Lens distortion has usually been corrected off-line. This paper intends to consider lens distortion as an integral part of a camera. We first describe the epipolar geometry between two images with lens distortion. For a point in one image, its corresponding point in the other image should lie on a so-called epipolar curve. We then investigate the possibility of estimating the distortion parameters and the fundamental matrix based on the generalized epipolar constraint. Experimental results with computer simulation show that the distortion parameters can be estimated correctly if the noise in image points is low and the lens distortion is severe. Otherwise, it is better to treat the cameras as being distortion-free.

Keywords: Camera Calibration, Lens Distortion, Epipolar Geometry, Fundamental Matrix, Epipolar Curve

### Self-Maintaining Camera Calibration Over Time

Z. Zhang and V. Schenk. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR'97), pages 231-236, Puerto Rico, June 17-19, 1997.

The success of an intelligent robotic system depends on the performance of its vision-system which in turn depends to a great extend upon the quality of its calibration. During the execution of a task the vision-system is subject to external influences such as vibrations, thermal expansion etc. which affect and possibly render invalid the initial calibration. Moreover, it is possible that the parameters of the vision-system like e.g. the zoom or the focus are altered intentionally in order to perform specific vision-tasks. This paper describes a technique for automatically maintaining calibration of stereovision systems over time without using again any particular calibration apparatus. It uses all available information, i.e. both spatial and temporal data. Uncertainty is systematically manipulated and maintained. Synthetical and real data are used to validate the proposed technique, and the results compare very favourably with those given by classical calibration methods.

Keywords: Camera calibration, Calibration maintaining, Dynamic vision, Pose determination, 3D vision.

### A General Expression of the Fundamental Matrix for Both Perspective and Affine Cameras

Z. Zhang and G. Xu. In Proc. Fifteenth International Joint Conference on Artificial Intelligence (IJCAI'97), pages 1502-1507, Nagoya, Japan, August 23-29, 1997.

This paper addresses the recovery of structure and motion from two uncalibrated images of a scene under full perspective or under affine projection. Epipolar geometry, projective reconstruction, and affine reconstruction are elaborated in a way such that everyone having knowledge of linear algebra can understand the discussion without difficulty. A general expression of the fundamental matrix is derived which is valid for any projection model without lens distortion (including full perspective and affine camera). A new technique for affine reconstruction from two affine images is developed, which consists in first estimating the affine epipolar geometry and then performing a triangulation for each point match with respect to an implicit common affine basis. This technique is very efficient.

Keywords: Motion Analysis, Epipolar Geometry, Uncalibrated Images, Non-Metric Vision, 3D Reconstruction, Fundamental Matrix.

### Modeling Geometric Structure and Illumination Variation of a Scene from Real Images

Z. Zhang. In Proc. International Conference on Computer Vision (ICCV'98), Bombay, India, January 4--7, 1998.

We present in this paper a system which automatically builds, from real images, a scene model containing both 3D geometric information of the scene structure and its photometric information under various illumination conditions. The geometric structure is recovered from images taken from distinct viewpoints. Structure-from-motion and correlation-based stereo techniques are used to match pixels between images of different viewpoints and to reconstruct the scene in 3D space. The photometric property is extracted from images taken under different illumination conditions (orientation, position and intensity of the light sources). This is achieved by computing a low-dimensional linear space of the spatio-illumination volume, and is represented by a set of basis images. The model that has been built can be used to create realistic renderings from different viewpoints and illumination conditions. Applications include object recognition, virtual reality and product advertisement.

Keywords: Geometric modeling, Representation, 3D reconstruction, Shading (illumination), CAD/CAM, Virtual reality, Rendering, Object recognition.

### Euclidean Structure from Uncalibrated Images Using Fuzzy Domain Knowledge: Application to Facial Images Synthesis

Z. Zhang, K. Isono, and S. Akamatsu. In Proc. International Conference on Computer Vision (ICCV'98), Bombay, India, January 4--7, 1998.

Use of uncalibrated images has found many applications such as image synthesis. However, it is not easy to specify the desired position of the new image in projective or affine space. This paper proposes to recover Euclidean structure from uncalibrated images using domain knowledge such as distances and angles. The knowledge we have is usually about an object category, but not very precise for the particular object being considered. The variation (fuzziness) is modeled as a Gaussian variable. Six types of common knowledge are formulated. Once we have a Euclidean description, the task to specify the desired position in Euclidean space becomes trivial. The proposed technique is then applied to synthesis of new facial images. A number of difficulties existing in image synthesis are identified and solved. For example, we propose to use edge points to deal with occlusion.

Keywords: 3D reconstruction, uncalibrated images, image synthesis, representation, fuzzy domain knowledge.

### Understanding the Relationship Between the Optimization Criteria in Two-View Motion Analysis

Z. Zhang. In Proc. International Conference on Computer Vision (ICCV'98), Bombay, India, January 4--7, 1998.

The three best known criteria in two-view motion analysis are based, respectively, on the distances between points and their corresponding epipolar lines, on the gradient-weighted epipolar errors, and on the distances between points and the reprojections of their reconstructed points. The last one has a better statistical interpretation, but is, however, much slower than the first two. In this paper, we show that the last two criteria are equivalent when the epipoles are at infinity, and differ from each other only a little even when the epipoles are in the image. The first two criteria are equivalent only when the epipoles are at infinity and when the observed object has the same scale in the two images. This suggests that the second criterion is sufficient in practice because of its computational efficiency. The result is valid for both calibrated and uncalibrated images.

Keywords: Motion analysis, multiple-view geometry, 3D reconstruction, optimization criteria, comparison.

### A New Multistage Approach to Motion and Structure Estimation by Gradually Enforcing Geometric Constraints

Z. Zhang. In Proc. 3rd Asian Conference on Computer Vision (ACCV'98), pages 567--574, Hong Kong, January 8--11, 1998.

The standard 2-stage algorithm first estimates the 9 essential parameters defined up to a scale factor and then refines the motion estimation based on some statistically optimal criteria. We propose in this paper a novel approach by introducing an intermediate stage which consists in estimating a $3\times 3$ matrix defined up to a scale factor by imposing the \emph{rank-2 constraint} (the matrix has seven independent parameters). The idea is to \emph{gradually} project parameters estimated in a high dimensional space onto a \emph{slightly lower}-dimensional space, namely from 8 dimensions to 7 and finally to 5. Experiments with synthetic and real data show a considerable improvement over the 2-stage algorithm. Our conjecture from this work is that the imposition of the constraints arising from projective geometry should be used as an intermediate step in order to obtain reliable 3D Euclidean motion and structure estimation from multiple calibrated images.\newline

Keywords: Motion and stereo, 3D reconstruction, Structure from motion, Multiple-view geometry, Gradual constraint enforcing

### Image-Based Geometrically-Correct Photorealistic Scene/Object Modeling (IBPhM): A Review (Invited talk)

Z. Zhang. In Proc. 3rd Asian Conference on Computer Vision (ACCV'98), pages 340--349, Hong Kong, January 8--11, 1998.

There are emerging interests from both computer vision and computer graphics communities in obtaining photorealistic modeling of a scene or an object from real images. This paper presents a tentative review of the computer vision techniques used in such modeling which guarantee the generated views to be geometrically correct. The topics covered include mosaicking for building environment maps, CAD-like modeling for building 3D geometric models together with texture maps extracted from real images, image-based rendering for synthesizing new views from uncalibrated images, and techniques for modeling the appearance variation of a scene or an object under different illumination conditions. Major issues and difficulties are addressed.

Keywords: Photorealistic modeling, image-based rendering, multiple-view geometry, photometric models, CAD, camera calibration, 3D reconstruction, uncalibrated images, domain knowledge, illumination variation.

### Comparison Between Geometry-Based and Gabor-Wavelets-Based Facial Expression Recognition Using Multi-Layer Perceptron

Z. Zhang, M. Lyons, M. Schuster, S. Akamatsu. In Proc. 3rd IEEE International Conference on Automatic Face and Gesture Recognition (FG'98), Nara, Japan, April 14-16, 1998.

In this paper, we investigate the use of two types of features extracted from face images for recognizing facial expressions. The first type is the geometric positions of a set of fiducial points on a face. The second type is a set of multi-scale and multi-orientation Gabor wavelet coefficients extracted from the face image at the fiducial points. They can be used either independently or jointly. The architecture we developed is based on a two-layer perceptron. The recognition performance with different types of features has been compared, which shows that Gabor wavelet coefficients are much more powerful than geometric positions. Furthermore, since the first layer of the perceptron actually performs a nonlinear reduction of the dimensionality of the feature space, we have also studied the desired number of hidden units, i.e., the appropriate dimension to represent a facial expression in order to achieve a good recognition rate. It turns out that five to seven hidden units are probably enough to represent the space of feature expressions.

Keywords: Facial expression recognition, learning, Gabor wavelets, multilayer perceptron.