Image Processing:
The Future Vital Link?

James F. Blinn

Jet Propulsion Laboratory

Pasadena California, 91103

1970

Computer Graphics and Image Processing have, until recently, been treated as fairly disjoint subjects. Since both disciplines basically deal with pictures there are potentially many techniques and insights of each field that may be of benefit to the other. It is the purpose of this paper to examine this primarily from the point of view of the Computer Graphics community making use of the techniques of Image Processing. We begin by describing the basic relationship between Image Processing and Computer Graphics as depicted in figure 1.

Computer graphics consists of algorithms and data structures for the purpose of displaying data in pictorial form. The most common and most easily generated types of images consist of graphs or bar charts moving up in complexity to diagrammatic images such as circuit drawings. The most elaborate type of graphic images take the form of shaded simulations of three dimensional scenes. These pictures are generated by scanning some database that describes the system of interest. Computer graphics can then be characterized as accepting data as input and generating pictures as output. Image Processing and Pattern Recognition, on the other hand, begin with an already existing image, typically a digitized photograph. There are two main types of processing that they perform. Firstly there is the image enhancement function. In this case, the desired output is simply another picture that "looks better" than the original. Either the contrast in enhanced, or various types of noise or geometric distortions may be removed. The second, and more ambitious function, is to extract some information from the image concerning what it is a picture of. This can simply be a matter of identifying regions of common texture or it may concern determination of whether a certain object appears in the picture or not. The most advanced image understanding systems attempt to reconstruct a complete three dimensional description of the object from just a few images of it. In effect, this second function of image processing takes pictures as input and produces a database as output.

It is the realization that computer graphics and image processing are inverse operations that defines the common ground between them. Computer Graphics can make use of image processing techniques to enhance the images it makes or to actually digitize the objects it is desired to draw pictures of. Image processing can use computer graphics to generate controlled test images or to preview planned image transformations. In fact, for some of the more advanced applications in each field, the flow of knowledge back and forth becomes such that the two fields become indistinguishable.

The central item in either of the two operations is the data base. Many aspects of computer graphics concern geometric operations on this data often beyond that which is strictly necessary to draw pictures. Similarly, geometric modeling is important to many of the geometric reconstruction tasks of image processing. For this reason, geometric modeling techniques are sometimes included in graphics or image processing whereas they are really a subject in their own right. Such techniques developed for one field are often the most easily transportable piece of knowledge to the other field.

COMMON HARDWARE

The first benefit to be gained by joining forces with the Image Processing community is the utilization of common hardware. The effect of this doubled market encourages manufacturers to produce more types of display equipment and helps lower the cost due to mass production. In addition, the use of common hardware encourages intercommunication between practitioners in these fields.

The Random Access Frame Buffer

In the past, computer graphics has used rather specialized line drawing display devices driven by a relatively small refresh memory. These displays, though very accurate and fast, are, even now, quite expensive. Such displays are not suitable for image processing because they are not usually built with sufficient vector repeatability, intensity resolution or refresh memory to display continuous tone images. Image processing, in contrast, uses standard television monitors as its display device. To deal with shaded images these displays must be refreshed by a large memory containing an intensity value for each pixel. Early devices used relatively slow rotating disks or shift registers. Such devices were not often used for computer graphics because of their lack of speed and because of the lack of resolution afforded by television monitors.

Recently, the development of inexpensive random access memories has made it possible to create random access refresh memories that are large enough to devote an entire word of up to 24 bits to each pixel. The result is a display device called a random access frame buffer that is inexpensive and has capabilities that are attractive to both the computer graphics and the image processing communities. A diagram of such a device appears in figure 2. Its effective use for computer graphics has been already increased due to some cross fertilization from image processing to help overcome the resolution problem of television.

Color Maps

Frame buffers with 24 bits per pixel can devote a field of 8 bits to each of the color primaries red, green and blue. This allows all colors within the color gamut of a television display to be specified. More commonly, however, a frame buffer is provided with 8 bits per pixel which feed into a device called a color map. Rather than having the 256 possible values representable by the 8 bits be fixed to some restricted set of colors the color map allows this color space to be selected by the programmer. When the television signal is generated, rather than sending the 8 bits directly to a D/A converter, they are used as an index into a separate table stored in a separate fast memory: the color map. The color map thus contains 256 entries, each containing a field of up to 12 bits for each of the desired red, green and blue values of the pixel value.

One of the original uses for this transformation of the pixel values is for contrast enhancement in image processing. An image which is digitized to 8 bit resolution might actually produce pixel values between, say, 50 and 120. Displaying such an image directly would produce a very washed out image. The user could, however, set up the color map to transliterate the value 50 into black and the value 120 into full white, (with intermediate values linearly interpolated). This produces an image with much more contrast and is thus more easily interpreted. The color mapping technique allows this to be accomplished without needing to do arithmetic on the quarter of a million pixel values in the image. In fact the mapping table can be adjusted with various ranges and emphases interactively at frame refresh rates.

Another necessary function of the color map is to compensate for some of the nonlinear properties of the television display itself. Television monitors are designed so that the screen intensity is proportional not to the intensity voltage but to this voltage raised to some power. The inverse of this power function is typically placed in the color map so that the intensity on the screen will bear a linear relationship to the values placed in the frame buffer memory.

Such a device, while initially developed for image processing purposes, has now become a tool of interactive graphics. One remaining disadvantage of the frame buffer for graphics is its lack of real time motion capability. The sheer number of pixels makes it difficult to do anything to substantial areas of the screen in a very short time. Some motion effects can be generated, however, by cleverly specifying the image encoded in the pixel values. The color map, due to its small size, can be modified at frame refresh rates to alter the interpretation of the pixels. A description of several such schemes is given by Shoup [9].

UNDERSTANDING THE IMAGING PROCESS

The next area of contribution to computer graphics concerns the mathematical basis of digital images. Image processing treats the true or ideal image as a continuous intensity function of the x,y screen space. To display it one must sample this function on a regular grid. The sampling of any continuous function is subject to the sampling theorem which states that a sampled signal cannot reproduce any frequency components higher than half the sampling frequency. A signal with high frequency components (e.g. sharp edges) sampled with too low a frequency results in the high frequencies return under the aliases of low frequencies. See figure 3. The solution to this problem is first to remove the high frequency components from the signal and then sample. Visually this means that the image must be blurred by an appropriate amount to make the edges spread over one inter-pixel distance. Another way of saying this is that a pixel value should be some weighted average of the intensities in the neighborhood of the pixel. In synthesizing shaded images a computer graphics program must calculate an intensity value for each pixel on the screen. This is usually done by a series of geometrical calculations based on rays between the eye position and each pixel on the screen. If a particular ray intersects an object in the scene then that object is visible at the corresponding pixel. Images generated by rays through the center of each pixel will produce a stairstep effect at sharp edges of objects. See figure 4. This is an example of aliasing. The solution, area averaging, is known to alleviate the problem but its efficient implementation within computer graphics algorithms is still a subject for research.

Another aspect of the imaging process is the study of the perceptual mechanisms of the eye. Since the imaging process is one of approximating a real image by some mechanical means it is important to know what features of an image the eye is most sensitive to and what features it is less sensitive to. Image processing has studied the eye for some time [10]. The resulting models reveal the eye as a spatial bandpass filter [8] see figure 5. Low frequency intensity fluctuations across an image are surpressed while to high frequencies are emphasized. This serves to aid the perception mechanism by enhancing edges in the scene. It has implications in the production of shaded images, particularly when smoothly curved surfaces are to be displayed. Early attempts to simulate such surfaces as collections of polygons ran afoul of just this sensitivity of the eye. Simple techniques to remove sharp edge transitions [5] go a long way toward bringing about the illusion of smooth surfaces. On the other end of the eye response scale, very high frequencies tend to get blurred out by the eye. This allows utilization a technique for expanding the effective grey scale of a display device, known as dither. This effectively adds low amplitude, high frequency noise to an image to spatially scatter pixels near the transitions between quantization levels. The blurring action of the eye will then make the quantization jump appear as a smooth transition.

In image processing it is important to accurately measure and display the intensities of pixels since they will be used in the mathematical algorithms used. This has prompted efforts to compensate for inaccuracies in the digitizing and reproducing process. This same accruay of intensity display and measurement becomes more and more important in computer graphics as more image processing is involved in the picture generation process. In particular, if accurate anti alising is performed it is extremely important that the calculated intensities are accurately reproduced on the display or film. Inaccurate reproduction may completely defeat the anti aliasing process. The general mechanism of intensity correction involves measuring intensities in a test image to determine the transfer function of the intended to the actual image. By passing all desired intensities through the inverse of this function, a false intensity can be calculated which will appear correct on the display device. See Catmull [2].

DATA INPUT FROM IMAGE PROCESSING

The next major application of the techniques of image processing concerns database generation. This database may be either a three dimensional representation of some shape or an image of the surface features on the objects. The ideal general purpose image understanding system, illustrated in figure 6, can take a few photographs of an object and build a database describing the three dimensional shape. This system is still a ways away. Various special case solutions have had some success. We will here discuss some of them to see how they point the way for future efforts.

Three Dimensional Shape Reconstruction

The most manual technique begins with two images of the desired object. These may be the orthographic side and end views provides by blueprints. The same technique can, however, be applied if the images are two arbitrary perspective views, as described by Sutherland [11]. The only restriction is that key points on the object must be visible from both views. The user manually digitizes the same point on each image and mathematical techniques are used to reconstruct the three coordinates of the point.

A similar case occurs when the original images represent many serial sections through an object. This is the case in Computer Aided Tomography as well as actually sectioning the object. In this case the outlines of the object are digitized either manually or via edge detection from a scanned image. Two consecutive such sections must then be covered with a skin of polygons or some higher order surface to reconstruct a three dimensional model of the object. Various algorithms have been devised to do this semi-automatically, [3,4].

A more automatic method of three dimensional reconstruction has been developed by Baumgart [1]. Several views of the object are scanned in as photographs. Edge detection techniques are used to find the silhouette edge of each image. Each image then defines a generalized cone with the apex at the camera position and the sides intersecting the silhouette edge. The entire object lies within this cone. The intersection of several such generalized cones produces a shape approximating the initial object. Calculating the intersection of two complicated objects is a quite intricate but very useful geometrical modeling task. The work of Baumgart to solve this problem in the context of image understanding is among the earliest solutions to this problem. In addition, the data structures developed for this project have applicability in many other aspects of geometric modeling.

Another automated technique that is useful in certain cases begins with stereo pairs of images. In such pairs the view of an object in the distance is very similar between the views. Corresponding regions of the two images are identified by correlation techniques. The relative displacements in the horizontal direction can be used to calculate the distance to the object. This has been used primarily for ranging of single objects but it may in the future be used for shape determination.

The preceding methods make use of the geometric properties of the several images used. Since the eye and brain can still make some three dimensional interpretation of an image from just the lighting information there is some hope that this process can be performed by computers also. Computer graphics must calculate shading intensities by some model of light reflection from surfaces. Such models are based on the direction of the surface normal and the light source direction. This process can be inverted by taking a photograph as the source of input intensities and calculating the normal vector directions that must have been on the object to generate those intensities. The solution can be exact if there are three identical images with only the light source direction changed. A more heuristic approach is necessary if only one view is available but some success has been obtained by Horn [6]. Further results in this direction require some more accurate light reflection models and better methods of subdividing an image into regions of similar objects.

Texture Pattern Reconstruction

A technique becoming more and more common for the production of realistic pictures in Computer Graphics is called texture mapping. This is used to give surface detail to the otherwise mathematically smooth and homogeneous surfaces used to model objects. The texture in question is defined by an array of surface intensities which are keyed to the surface by some bivariate function embedded in it, e.g. the latitudes and longitudes of a sphere. The generation of interesting and realistic texture arrays is important to produce a pleasing effect. One way to generate such data is to process photographs of real textured surfaces by an inverse picture generation technique. In this case one starts with a picture and calculates the texture pattern. For this to be possible, the viewing transformation in effect for the original photograph must be known. The texture array is then scanned, each location is transformed into screen space and the intensity is retrieved from the photograph. See figure 7. This technique has been used at JPL for the purpose of mapping the moons of Jupiter. This map data base can then be used to make images of the moons from directions other than those actually seen.

CONJOINED APPLICATIONS

As computer graphics becomes more and more concerned with raster type displays the production of images will look more and more like image processing. This comes from a basic reorganization of thinking about the image creation process. In line drawing systems, one program and one database was necessary for the creation of the image since it was completely a one way data path: from the program to the screen. To make an image of overlaid objects, the program must know about all portions of the data base which overlapped a particular portion of the screen.

In the case of raster images using frame buffers, a program may combine two images from totally disjoint sources with no knowledge of where they came from. A computer graphics algorithm known as the painters algorithm [7] utilizes this capability by drawing all visible objects in order back to front. Each image overwrites that which is currently in the buffer, thus hiding the farther parts. Thus entirely different algorithms may be used for the different objects in the scene. This is aided by the fact that the program can read back portions of an image from the frame buffer memory and do some local image processing on it. In fact, a modern raster graphics installation is becoming a collection of simple programs for "doing things" to the picture currently in the buffer or for combining several images into a new one.

REFERENCES

[1] Baumgart, B. G., Geometric modeling for computer vision, Stanford Univ. Comput. Sci. Dept., AIM-249, STAN-CS-74-463, Oct 1974.

[2] Catmull, E. E., A tutorial on compensation tables, Computer Graphics, Vol. 13, No. 2 (Aug 1979), pg. 1.

[3] Christiansen, H. N., Sederberg, T. W., Conversion of complex contour line definitions into polygonal element mosaics, Computer Graphics, Vol. 12, No. 3 (Aug 1978), pg 187.

[4] Fuchs, H., et al, Optimal surface reconstruction from planar contours, CACM, 20(10):693, October 1977.

[5] Gouraud, H., Computer display of curved surfaces, IEEE Transactions, C-20(6):623, June 1971.

[6] Horn, B. K. P., Determining shape from shading, Chapter 4 in Winston, P. H., (ed):"The Psychology of Computer Vision", McGraw Hill, 1975.

[7] Newell, M. E., Newell, R. G., and Sancha, T. L., A new approach to the shaded picture problem, Proc. ACM Nat. Conf., 1972, pg 443.

[8] Pearson, D. E., Transmission and Display of Pictorial Information, Pentech Press, 1975.

[9] Shoup, R. G., Color table animation, Computer Graphics, Vol. 13, No. 2 (Aug 1979), pg 8.

[10] Stockham, T. G. Jr., Image processing in the context of a visual model, Proc. IEEE, Vol 60 (1972), pg. 828.

[11] Sutherland, I. E., Three dimensional data input by tablet, Proc. IEEE, April 1974, Reprinted in "Tutorial: Computer Graphics", IEEE Computer Soc., 1979.