The IBM System/36O Model 91:
Machine Philosophy and Instruction-Handling1
D. W. Anderson / F. J. Sparacio / F. M. Tomasulo
Abstract The System/360 Model 91 central processing unit provides internal computational performance one to two orders of magnitude greater than that of the IBM 7090 Data Processing System through a combination of advancements in machine organization, circuit design, and hardware packaging. The circuits employed will switch at speeds of less than 3 nsec, and the circuit environment is such that delay is approximately 5 nsec per circuit level. Organizationally, primary emphasis is placed on (1) alleviating the disparity between storage time and circuit speed, and (2) the development of high speed floating-point arithmetic algorithms.
This paper deals mainly with item (I.) of the organization. A design is described which improves the ratio of storage bandwidth and access time to cycle time through the use of storage interleaving and CPU buffer registers. It is shown that history recording (the retention of complete instruction loops in the CPU) reduces the need to exercise storage, and that sophisticated employment of buffering techniques has reduced the effective access time. The system is organized so that execution hardware is separated from the instruction unit; the resulting smaller, semiautonomous "packages" improve intra-area communication.
This paper presents the organizational philosophy utilized in IBM's highest performance computer, the System/360 [Amdahl, Blaauw, and Brooks, 1964] Model 91. The first section of the paper deals with the development of the assembly-line processing approach adopted for the Model 91. The organizational techniques of storage interleaving, buffering, and arithmetic execution concurrency required to support the approach are discussed. The final topic of this section deals with design refinements which have been added to the basic organization. Special attention is given to minimizing the time lost due to conditional branches, and the basic interrupt problem is covered.
The second section is comprised of a treatment of the instruction unit
of the Model 91. It is in this unit that the basic control is exercised
which leads to attainment of the performance objectives. The first topic
is the fetching of instructions from storage. Branching and interrupting
are discussed next. Special handling of branching, such that storage accessing
by instructions is sometimes eliminated, is also treated. The final section
discusses the interlocks required among instructions as they are issued
to the execution units, the initiation of operand fetches from storage,
status switching operations, and I/O handling.
The objective of the Model 91 is to attain a performance greater by one to two orders of magnitude than that of the IBM 7090. Technology (that is, circuitry and hardware) advances2 alone provide only a four-fold performance increase, so it is necessary to turn to organizational techniques for the remaining improvement. The appropriate selection of existing techniques and the development of new organizational approaches were the objectives of the Model 91 CPU design.
The primary organizational objective for a high performance CPU is concurrency-the parallel execution of different instructions. A consideration of the sequence of functions involved in handling a typical processor instruction makes the need for this approach evident. This sequence-instruction fetching, instruction decoding, operand address generating, operand fetching, and instruction execution-is illustrated in Fig. 1. Clearly, a primary goal of the organization must be to avoid the conventional concatenation of the illustrated functions for successive instructions. Parallelism accomplishes this, and, short of simultaneously performing identical tasks for adjacent instructions, it is desired to "overlay" the separate instruction functions to the greatest possible degree. Doing this requires separation of the CPU into loosely coupled sets of hardware, much like an assembly line, so that each hardware set, similar to its assembly line station counterpart, performs a single specific task. It then becomes
1IBM Journal, vol. 11, January 1967, pp. 8-24.
2Circuits employed are from the IBM ASLT family and provide
an in-environment switching time in the 5 nsec range.
previous | contents | next