Overcoming Memory Latency And Enabling Parallelism With The Greedy CAM Architecture

MSR-TR-2007-174 |

Proc. of IEEE Infocom 2006

Computing today is inexorably headed towards increased parallelism. Using a hybrid dataflow approach to the problem, the Greedy CAM architecture addresses issues of programmability and some of the realities of today’s hardware, such as the clock rate plateau and the curse of memory latency. Greedy CAM employs a more generalized use of tags than previously seen in tagged token architectures in that the tags are related functionally at the inputs to each of the computational kernels. A Content Addressable Memory works with the functional tag relationships in order to deterministically prefetch operands to feed a pipelined computing engine. Beyond simple contiguous or stride based operand accesses, the CAM enables truly random memory access patterns and conditional execution that are friendly to a parallel, pipelined, machine. Functional tag relationships and the CAM are used together to provide main memory, synchronization and RAM abstraction. Kernels are run on the machine using a coarse grained dataflow control scheme that can dynamically expand or contract to fit the available parallel hardware at runtime. Lastly, the architecture lends itself to several implementations, including reconfigurable hardware or a software approach using a many core like architecture.