Finding and Reproducing Heisenbugs in Concurrent Programs

OSDI 08: Eigth Symposium on Operating Systems Design & Implementation |

Published by USENIX

Concurrency is pervasive in large systems. Unexpected interference among threads often results in “Heisenbugs” that are extremely difficult to reproduce and eliminate. We have implemented a tool called Chess for finding and reproducing such bugs. When attached to a program, Chess takes control of thread scheduling and uses efficient search techniques to drive the program through possible thread interleavings. This systematic exploration of program behavior enables Chess to quickly uncover bugs that might otherwise have remained hidden for a long time. For each bug, Chess consistently reproduces an erroneous execution manifesting the bug, thereby making it significantly easier to debug the problem. Chess scales to large concurrent programs and has found numerous bugs in existing systems that had been tested extensively prior to being tested by Chess. Chess has been integrated into the test frameworks of many code bases inside Microsoft and is used by testers on a daily basis.