3D position, attitude and shape input using video tracking of hands and lips

Andrew Blake and Michael Isard


Recent developments in video-tracking allow the outlines of moving, natural objects in a video-camera input stream to be tracked live, at full video-rate. Previous systems have been available to do this for specially illuminated objects or for naturally illuminated but polyhedral objects. Other systems have been able to track non-polyhedral objects in motion, in some cases from live video, but following only centroids or key-points rather than tracking whole curves. The system described here can track accurately the curved silhouettes of moving non-polyhedral objects at frame-rate, for example hands, lips, legs, vehicles, fruit, and without any special hardware beyond a desktop workstation and a video-camera and framestore. The new algorithms are a synthesis of methods in deformable models, B-spline curve representation and control theory. This paper shows how such a facility can be used to turn parts of the body — for instance, hands and lips — into input devices. Rigid motion of a hand can be used as a 3D mouse with non-rigid gestures signalling a button press or the “lifting” of the mouse. Both rigid and non-rigid motions of lips can be tracked independently and used as inputs, for example to animate a computer-generated face.


Publication typeInproceedings
Published inProc. ACM Siggraph
> Publications > 3D position, attitude and shape input using video tracking of hands and lips