378 THE PDP-11 FAMILY
reduced by eliminating the instruction fetches that occur in the service routines. These results are also shown in the table. Comparison of the results shows that the micro-thread implementation is faster (as expected), but also that its speed is no better than that of FORTRAN IV- PLUS. Could this be coincidence or is there reason to believe these results should be obtained?
To answer this, we formulated a simple intuitive model for the expected size and speed of code on an idealized FORTRAN machine. To estimate the code size:
.Count one unit for each variable that is referenced (e.g., A(I) counts as two).
.Count one unit for each operation per formed (e.g., assignment or subscripting are unit operations).
To estimate the memory cycles for execution:
.Count one unit for each variable that is referenced.
.Count one unit for each operation per formed.
.Count one, two, or four units for each value fetch or store operation depending on the size of the data.
This very simple model is appropriate only for compilers that produce code based only on isolated source information, which is true of the original FORTRAN. Optimizing compilers, such as FORTRAN IV-PLUS, do better than suggested by this model by eliminating or simplifying operations (for example, by constant expression elimination or moving invariant computations out of loops, and/or by keeping values in registers instead of main memory, especially across loops). Consequently, the model serves primarily as a relatively implementation- independent frame of reference for comparing alternative implementations.
The sizes and cycle counts from this model for the sample statements are also shown in Table 1. These values are quite similar to values for both the micro-thread and FORTRAN IV- PLUS implementations.
We interpreted these results as a clear demonstration that a micro-threaded implementation could not significantly outperform the existing FORTRAN TV-PLUS implementation. Further, effort expended for greater performance would be better directed toward improved optimization in FORTRAN IV-PLUS (which would benefit existing hardware products) or toward faster hardware per se. *
There is also a broader interpretation of the results that is worth reflection. The threaded implementation was designed to be a good FORTRAN architecture. Yet, when implemented in microcode in a manner comparable with the host PDP- 11 architecture, the performance is close to that achieved by the FORTRAN TV-PLUS compiler and also close to that of an "ideal" model. One is led to speculate that the PDP-l1 with FP11 is also a good FORTRAN architecture.
Many individuals contributed to the design, implementation, and evolution of the PDP-l 1 FORTRAN product. The following were particularly involved in those aspects described in this paper. Jim Bell, Dave Knight, and the author participated in the initial design evaluation that led to the basic virtual machine. Dave was project leader for the first versions of the product. Rich Grove participated in the support of the FP11 and FIS options. The extended virtual machine design and implementation, and the microcode feasibility analysis were done by the author. Finally, Craig Mudge assisted in the preparation of this paper with valuable discussion and criticism, and by not accepting "no" for an answer.
* Note that Digital did both. FORTRAN IV-PLUS V2 and the FP11-C were both released in early 1976 with each offering significant performance improvements.