|
|

PPMexe
(with Milenko Drinic and Hoi Vo)
keywords: compression of binaries - programs - executables - software, PPM, x86
binary compression, PPM model cleaning, functionally-invariant
transformations, split-stream compression, instruction rescheduling,
dual-alphabet PPM, on-line and run-time software distribution system
contact:
darkok@microsoft.com
We introduce a compression mechanism for
program binaries that explores their syntax and semantics to achieve
superior compression ratios. The foundation of our compression codec is
the generic paradigm of prediction by partial matching (PPM [1]). We
combine PPM with two pre-processing steps:
- instruction rescheduling to improve
prediction rates and
- heuristic partitioning of a program
binary into streams with high auto-correlation. Ad-hoc split-stream
compression for binaries has been introduced by Lucco [2].
We improve the traditional PPM algorithm
by:
- using an additional alphabet of
frequent variable-length super-symbols extracted from the input
stream of fixed-length symbols.
- In addition, we introduce a
low-overhead mechanism that enables decompression starting from an
arbitrary instruction of the executable, a feature pivotal for
run-time software delivery.
The compression algorithm is implemented
for x86 binaries and tested on several large applications. Binaries
compressed using our codec, are 18-24% smaller than files created using
off-the-shelf PPMD, one of the best available compressors. We augmented
our codec with several improvements that particularly facilitate on-line
and run-time software distribution platforms.
Learn more about our system from [3,4]. The following
issued
patent
covers the novel ideas.
- J. Cleary and I. Witten.
Data compression using adaptive coding and partial string matching.
IEEE Transactions on Communications, Vol.32, (no.4), pp.396-402,
1984.
- S. Lucco.
Split-stream dictionary program compression. ACM SIGPLAN
Programming Languages Design and Implementation, pp.27-34, 2000.
- M. Drinic, , D. Kirovski, and
H. Vo.
PPMexe:
Program Compression. ACM Transactions on Programming
Languages and Systems, Vol.29 , (no.1), 2007.
- M. Drinic, D. Kirovski, and
M. Potkonjak.
PPM Model Cleaning.
IEEE Data Compression Conference, pp.163-72, 2003.
|