previous | contents | next

370 Part 2 ½ Regions of Computer Space
Section 4 ½
Multiple-Processor Systems

The auto-restart mechanism is responsible for reloading the system and is invoked by the suspect/monitor mechanism. Three basic steps are involved: adjusting the configuration masks for any deleted or quiesced processors, constructing a free memory list (deleting pages that have been marked errant), and loading a fresh copy of the kernel from disk. The new system is entered and initialization begins. This sequence is normally accomplished without human intervention and is so reliable that C.mmp runs without an operator.

The last mechanism associated with failure recovery is the automatic diagnostic driver, which initiates and monitors the deleted processors execution of a diagnostic. The driver maintains a history of the failures found by each processor as well as the processor's successful executions of the diagnostic. The histories may be printed on command and are also accessible from Hydra. If a processor is able to successfully run the diagnostic for a period of time determined by its failure history over the previous few days, the driver automatically returns it to the system. Automatic return is accomplished by executing the standard per-processor initialization and does not require pausing or reloading the system.

 

4. Conclusion

The successful implementation of systems such as Harpy, ZOG, several language compilers, several file and directory systems, ARPANET support, and measurement tools such as the script driver has shown that C. mmp and Hydra provide a useful, general-purpose computing environment on a multiprocessor. The symmetric design of C. mmp has proved to be valuable in error-recovery techniques and in simplifying process scheduling. Also, the kernel approach to operating-system design, the protection system, and the mechanisms for data abstraction have effectively allowed construction of much of the operating system as user-level programs:

The problems, such as reliability, memory contention, and the small-address problem, have been effectively managed, if not solved entirely. These problems were challenging and the reliability problems, especially, motivated a profitable research effort.

 

References

Almes and Robertson [1978]; Bellis [1978]; Bhandarkar [1972]; Cohen and Jefferson [1975]; DEC [1972]; Dijkstra [1968a]; Fuller, Almes, Broadley, Porgy, Karlton, Lesser, and Teter [1976]; Fuller and Harbison [1978]; Jam [19781; Levin, Cohen, Corwin, Pollack, and Wulf [1975]; Lowerre [1976]; Marathe [1977]; McGehearty 11980]; Newcomer, Cohen, Jefferson, Lane, Levin, Pollack, and Wulf [1976]; Oleinick [1979]; Oleinick and Fuller [1978]; Parnas [1972]; Robertson and Ramakrishna [1977]; Rubin, Guggenheim, and Bihary [1978]; Schroeder [1972]; Siewiroek, Kini, Joobbani, and Bellis [1978]; Strecker [1971]; Swan [1976]; Wulf and Bell [1972]; Wulf, Cohen, Corwin, Jones, Levin, Pierson, and Pollack [1974]; Wulf and Harbison [1978]; Wulf Levin, and Harbison [1980]; Wulf, Levin, and Pierson [1975]; Wulf, Russell, and Habermann [1971].

previous | contents | next