Traditionally, FPGAs have been confined to the limited role of small, low-volume ASIC replacements and as circuit emulators. However, continued Moore's law scaling has given FPGAs new life as accelerators for applications that map well to fine-grained parallel substrates. Examples of such applications include processor modeling, application-specific compression, and digital signal processing.
Although FPGAs continue to increase in size, some interesting designs still fail to fit in to a single FPGA. Many tools exist that partition RTL descriptions across FPGAs. Unfortunately, existing tools have low performance due to the inefficiency of maintaining the cycle-by-cycle behavior of RTL among discrete FPGAs. These tools are unsuitable for use in FPGA program acceleration, as the purpose of an accelerator is to make applications run faster.
In this talk, I present latency-insensitive channels, a language-level mechanism by which programmers express points in their design at which the cycle-by-cycle behavior of the design may be modified by the compiler. By decoupling the timing of portions of the RTL from the high-level function of the program, designs may be mapped to multiple FPGAs without suffering the performance degradation observed in existing tools. I will also detail the latency-insensitive module compiler which automates the implementation of designs described in terms of latency-insensitive channels on arbitrary networks of FPGAs. Using a diverse set of preexisting designs, I will demonstrate that latency-insensitive programs obtain significant gains in design feasibility, compilation time, and run-time when mapped to multiple FPGAs.