Textual Allusions to Artifacts in Software-related Repositories

  • Gina Venolia

MSR-TR-2006-73 |

Much of what is written about a software project is soon forgotten. Software repositories are full of valuable information about the project: Bug descriptions, check-in messages, email and newsgroup archives, specifications, design documents, product documentation, and product support logs contain a wealth of information that can potentially help software developers resolve crucial questions about the history, rationale, and future plans for source code. For a variety of reasons, developers rarely turn to these resources when trying to answer these questions. We are building a suite of tools to reduce the barriers to accessing these resources: browse, full-text search, artifact-based search, and implicit search. All these tools depend on an index that represents software-related artifacts and, crucially, the relationships among them. The quality of each tool is directly related to the quality and quantity of the relationships in the index. This paper discusses an extensible architecture for representing and provisioning artifacts and relationships among them. The artifacts and relationships form a typed graph. The graph is provisioned from structured data sources, structured files, and textual allusions to artifacts. Allusions are shown to contribute a significant portion of the relationships represented in the graph and to be at least partly responsible for causing the graph to be a scale-free network, cutting across the data source boundaries and increasing the “small world-ness” of the graph.