Darko
Systems
RF-DNA
ToleRace
Intrusion Prevention
The Martini Synch
PPMexe
Click Passwords
Fiber-based COA
Field division routing

PPMexe
(with Milenko Drinic and Hoi Vo)
keywords
: compression of binaries - programs - executables - software, PPM, x86 binary compression, PPM model cleaning, functionally-invariant transformations, split-stream compression, instruction rescheduling, dual-alphabet PPM, on-line and run-time software distribution system
contact: darkok@microsoft.com

We introduce a compression mechanism for program binaries that explores their syntax and semantics to achieve superior compression ratios. The foundation of our compression codec is the generic paradigm of prediction by partial matching (PPM [1]). We combine PPM with two pre-processing steps:

  • instruction rescheduling to improve prediction rates and
  • heuristic partitioning of a program binary into streams with high auto-correlation. Ad-hoc split-stream compression for binaries has been introduced by Lucco [2].

We improve the traditional PPM algorithm by:

  • using an additional alphabet of frequent variable-length super-symbols extracted from the input stream of fixed-length symbols.
  • In addition, we introduce a low-overhead mechanism that enables decompression starting from an arbitrary instruction of the executable, a feature pivotal for run-time software delivery.

The compression algorithm is implemented for x86 binaries and tested on several large applications. Binaries compressed using our codec, are 18-24% smaller than files created using off-the-shelf PPMD, one of the best available compressors. We augmented our codec with several improvements that particularly facilitate on-line and run-time software distribution platforms.

Learn more about our system from [3,4]. The following issued patent covers the novel ideas.

  1. J. Cleary and I. Witten. Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications, Vol.32, (no.4), pp.396-402, 1984.
  2. S. Lucco. Split-stream dictionary program compression. ACM SIGPLAN Programming Languages Design and Implementation, pp.27-34, 2000.
  3. M. Drinic, , D. Kirovski, and H. Vo. PPMexe: Program Compression. ACM Transactions on Programming Languages and Systems, Vol.29 , (no.1), 2007.
  4. M. Drinic, D. Kirovski, and M. Potkonjak. PPM Model Cleaning. IEEE Data Compression Conference, pp.163-72, 2003.