Jeff Carver, Richard Neil Pittman, and Alessandro Forin
An extensible processor provides a standard data-path and one or more regions for use as application-specific reconfigurable logic. In this paper we address two problems that arise in the practical use of extensible processors. Using multiple extensible regions can lead to avoidable time and space inefficiencies, and the physical placement of the interconnection points strongly affects the overall design timings. Standard tool-flows from FPGA manufacturers require the creation of separate configuration images for each region. The space and time complexities that this entails are undesirable, especially in an embedded system setting where storage is at premium. In this paper we introduce a run-time algorithm that allows the relocation of one configuration image to any number of compatible regions, in linear time. The application loader running on the data-path can perform the relocation along with the loading of the application code. We have implemented the algorithm on the eMIPS soft processor using two extensible regions, and on the MicroBlaze soft processor using four regions, in both cases targeting a Virtex-4 FPGA. There are two main advantages from image relocation. We save time at compilation because only one region needs to be synthesized. We save space at execution time by storing only one configuration in FLASH memory. The reconfigurable regions themselves are interfaced with the standard data-path using “bus macros”, connection points that are placed at fixed locations. The placement of the bus macros around a region has a noticeable impact on the timing of the design inside the region, and on the timings of the standard data-path outside the region. We have found that manual placement of the bus macros is not only tedious, but leads to sub-optimal timings even when following best design practices. We present a tool that uses design-space exploration to obtain automatic, near-optimal placement of the bus macros for the relocatable regions. Results show the worst solution found had a total timing score of 581,146 ps while the best solution was only 22,964 ps and the average over the design space was 175,682 ps. The score for the manual placement was 97,714 ps.