SIGGRAPH'98

Panel on Computer Vision in 3D Interactivity

Orlando, FL

July 19-24, 1998

It is commonly believed (but not yet proven) that Virtual Reality (VR) attains its power by captivating the user's attention to induce a sense of immersion. This is usually done with a display that allows the user to look in any direction, and that updates the user's viewpoint by passively tracking the user's head motion. HMDs and CAVEs are examples of these kinds of displays. However, there are other forms of VR where immersion occurs. Fish Tank VR [Deering 1992, Ware et al. 1993] uses passive head tracking, but uses a desktop stereo display rather than surrounding the user visually. Desktop VR [Robertson et al. 1997] is the use of animated interactive 3D graphics to build virtual worlds with desktop displays and without head tracking.

Current HMD-based VR techniques suffer from poor display resolution, display jitter, and lag between head movement and the resulting change to the display. These problems tend to inhibit the illusion of immersion. Fish Tank VR solves display resolution and jitter problems by using desktop stereo displays. Desktop VR solves all three problems, but at the expense of losing stereo and head tracking. Studies have shown that head-motion parallax is a stronger depth cue than stereopsis. Hence, adding head-motion parallax to a Desktop VR system could bring it quite close to Fish Tank VR capabilities. Computer vision provides a way to do that by tracking the user head motion without the user wearing any tracking sensors.

Another problem with VR is that users must wear special purpose equipment (HMDs, gloves, head trackers, etc.). For some applications that poses no problem at all. However, it does pose a problem for everyday or extended use. For example, an office worker is not likely to ever be willing to put on such equipment. In addition, extended use of such devices can be extremely fatiguing. Computer vision again provides a potential solution, allowing us to do head tracking and gesture recognition without requiring the user to wear any equipment.

Computer vision enables a number of other capabilities that make 3D interactivity more effective and enjoyable. Adding awareness to our systems becomes possible. The system cam know whether the user is present, whether the user is facing the screen, whether the user is engaged in some other activity (like talking on the phone or to another person in the room), and what the user is looking at on the screen. Computer vision can allow us to pay attention not just to hand gestures, but also to body posture and body gestures. Ultimately, with facial expression analysis, we should be able to detect mood changes in the user (e.g., happy, sad, confused, or angry) and have our 3D interfaces react appropriately.

Combining computer vision and 3D does involve solving some problems. The devices (cameras) are not expensive and are becoming ubiquitous. It is likely that in the near future the standard PC will include a camera. However, computer vision is still quite computationally expensive. We currently need to use multiprocessors, which are a bit more expensive. But, with processor price continuing to shrink, the use of multiple processors is becoming more common. We are nearing a point when computer vision and 3D interfaces can be effectively integrated and enable a number of exciting new interface capabilities.

References:

Deering, M. (1992), High resolution virtual reality, in Computer Graphics, 26, 2, 195-202.

Pausch, R., Proffitt, D., and Williams, G. (1997), Quantifying immersion in virtual reality, SIGGRAPH'97.

Robertson, G., Czerwinski, M., and van Dantzich, M. (1997), Immersion in Desktop Virtual Reality, UIST'97.

Ware, C., Arthur, K., and Booth, K.(1993), Fish tank virtual reality, CHI'93 Proceedings, 37-42.


This page updated 16 December, 1998 / George Robertson