previous | contents | next

multiplication had been part of a larger system with several variables to be stored, then an M(16 word; scratchpad) could have been used and the savings would have been even greater. Conceivably, the larger memory might already be required by the other demands of the system, so that the additional memory cost for the cheap multiply would in fact be zero.

Fig. 16. RTM diagram of 8-bit multiplier, shared use of a common DMgpa.

REDUCTION OF THE CONTROL PART: UNWINDING LOOPS

In general, it is difficult to make changes in the control part of an algorithm per se (i.e., not associated with some other change in the data part, such as in all the prior examples). Given that the algorithm and the data part are both fixed, then each module in the control part tends to perform some specific step in the algorithm, which function cannot be eliminated. Any attempt to provide an alternative way of performing such a function would also involve control and would negate any conjectured savings. (Try sharing Ke's or Kb's.)

One exception to this that arises with great regularity is avoiding control operations associated with an iteration by unwinding the loop. This is a standard technique. in programming, where it forms part of a general space-time trade-off. Some of the time spent controlling the loop can be avoided if the program is written as straight-line code with the requisite number of copies of the loop- body (which take extra memory). The same philosophy exists in RT-level design, where the trade-off is in the saving of hardware to do the loop calculations versus the extra control, which must be replicated for each iteration.

Figure 17 shows the application of this to multiply. The control loop has been removed and with it the DMgpa(C). The steps have been strung out as a sequence of 8 subprocesses. These can be defined as subroutine calls on the

 

130

previous | contents | next