CodeHow: Effective Code Search based on API Understanding and Extended Boolean Model

Automated Software Engineering (ASE2015) |

Published by ACM - Association for Computing Machinery

Over  the  years  of  software  development,  a  vast amount of source code has been accumulated. Many code search tools  were  proposed  to  help  programmers  reuse  previously written code by performing free-text queries over a large-scale codebase. Our experience shows that the accuracy of these code search tools are often unsatisfactory. One major reason is that existing tools lack of query understanding ability.

In this paper, we propose CodeHow, a code search technique that can recognize potential  APIs  a  user  query  refers  to.  Having  understood  the potentially relevant APIs, CodeHow expands the query with the APIs  and  performs  code  retrieval  by  applying  the  Extended Boolean model, which considers the impact of both text similarity and potential APIs on code search. We deploy the backend of CodeHow as a Microsoft Azure service and implement the front-end as a Visual Studio extension. We evaluate CodeHow on a large-scale codebase consisting of 26K C# projects downloaded from GitHub. The experimental results show that when the top 1  results  are  inspected,  CodeHow  achieves  a  precision  score of  0.794  (i.e.,  79.4%  of  the  first  returned  results  are  relevant code snippets). The results also show that CodeHow outperforms conventional  code  search  tools.  Furthermore,  we  perform  a controlled experiment and a survey of Microsoft developers. The results confirm the usefulness and effectiveness of CodeHow in programming practices.