Because of their flexibility and cost effectiveness, software implementations of number-theoretic cryptographic algorithms (e.g., RSA and Diffie-Hellman) are often desired. In order to obtain the required level of performance (speed) on a selected platform, the developers turn to algorithm-level optimizations and assembly language programming. In this paper, we examine these implementation issues and propose a design methodology for obtaining high-speed implementations. We show that between the full assembler implementation and the standard C implementation, there is a design option in which only a small number of code segments (kernel operations) are written in assembler in order to obtain a significant portion of the speed increase gained by the full assembler implementation. We propose a small set of kernel operations which are as simple as a{\cdot} b+c, where the numbers a, b, c are 1-word integers. The results of our experiments on several processors are also summarized.

}, author = {Cetin K. Koc and Tolga Acar}, month = {December}, title = {High-Speed Algorithms & Architectures For Number-Theoretic Cryptosystems}, url = {http://research.microsoft.com/apps/pubs/default.aspx?id=103204}, year = {1997}, }