Performance of Firefly RPC

  • Mike Schroeder ,
  • Michael Burrows

ACM Transaction on Computer Systems |

Publication

In this paper we report on the performance of the remote procedure call (RPC) implementation for the Firefly multiprocessor and analyze the implementation to account precisely for all measured latency. From the analysis and measurements, we estimate how much faster RPC could be if certain improvements were made. The elapsed time for an intermachine call to a remote procedure that accepts no arguments and produces no results is 2.66 ms. The elapsed time for an RPC that has a single 1440-byte result (the maximum result that will fit in a single packet) is 6.35 ms. Maximum intermachine throughput of application program data using RPC is 4.65 Mbits/s, achieved with four threads making parallel RPCs that return the maximum-size result that fits in a single RPC result packet. CPU utilization at maximum throughput is about 1.2 CPU seconds per second on the calling machine and a little less on the server. These measurements are for RPCs from user space on one machine to user space on another, using the installed syst