Codebook: Discovering and Exploiting Relationships in Software Repositories

Andrew Begel, Khoo Yit Phang, and Thomas Zimmermann


Large-scale software engineering requires communication and collaboration to successfully build and ship products. We conducted a survey with Microsoft engineers on inter-team coordination and found that the most impactful problems concerned finding and keeping track of other engineers. Since engineers are connected by their shared work, a tool that discovers connections in their work-related repositories can help. Here we describe the Codebook framework for mining software repositories. It is flexible enough to address all of the problems identified by our survey with a single data structure (graph of people and artifacts) and a single algorithm (regular language reachability). Codebook handles a larger variety of problems than prior work, analyzes more kinds of work artifacts, and can be customized by and for end-users. To evaluate our framework’s flexibility, we built two applications, Hoozizat and Deep Intellisense. We evaluated these applications with engineers to show effectiveness in addressing multiple inter-team coordination problems.


Publication typeInproceedings
Published inProceedings of the ACM/IEEE 32nd International Conference on Software Engineering
PublisherAssociation for Computing Machinery, Inc.
> Publications > Codebook: Discovering and Exploiting Relationships in Software Repositories