Andrew Begel is a researcher in the Human Interactions in Programming group at Microsoft Research.
Please click here to skip this useless page and go to my home page.
- Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel, The Emerging Role of Data Scientists on Software Development Teams, no. MSR-TR-2015-30, 12 April 2015.
Creating and running software produces large amounts of raw data about the development process and the customer usage, which can be turned into actionable insight with the help of skilled data scientists. Unfortunately, data scientists with the analytical and software engineering skills to analyze these large data sets have been hard to come by; only recently have software companies started to develop competencies in software-oriented data analytics. To understand this emerging role, we interviewed data scientists across several product groups at Microsoft. In this paper, we describe their education and training background, their raison d’être in software engineering contexts, and the type of problems on which they work. We identify five distinct working styles of data scientists and describe a set of strategies that they employ to increase the impact and actionability of their work.
- Andrew Begel, Thomas Fritz, Sebastian Mueller, Serap Yigit-Elliott, and Manuela Zueger, Using Psycho-Physiological Measures to Assess Task Difficulty in Software Development, in Proceedings of the International Conference on Software Engineering, International Conference on Software Engineering, 4 June 2014.
Software developers make programming mistakes that cause serious bugs for their customers. Existing work to detect problematic software focuses mainly on post hoc identification of correlations between bug fixes and code. We propose a new approach to address this problem — detect when software developers are experiencing difficulty while they work on their programming tasks, and stop them before they can introduce bugs into the code. In this paper, we investigate a novel approach to classify the difficulty of code comprehension tasks using data from psycho-physiological sensors. We present the results of a study we conducted with 15 professional programmers to see how well an eye-tracker, an electrodermal activity sensor, and an electroencephalography sensor could be used to predict whether developers would find a task to be difficult. We can predict nominal task difficulty (easy/difficult) for a new developer with 64.99% precision and 64.58% recall, and for a new task with 84.38% precision and 69.79% recall. We can improve the Naive Bayes classifier’s performance if we trained it on just the eye-tracking data over the entire dataset, or by using a sliding window data collection schema with a 55 second time window. Our work brings the community closer to a viable and reliable measure of task difficulty that could power the next generation of programming support tools.
- Andrew Begel and Thomas Zimmermann, Analyze This! 145 Questions for Data Scientists in Software Engineering, in Proceedings of the 36th International Conference on Software Engineering (ICSE 2014), ACM, June 2014.
In this paper, we present the results from two surveys related to data science applied to software engineering. The first survey solicited questions that software engineers would like data scientists to investigate about software, about software processes and practices, and about software engineers. Our analyses resulted in a list of 145 questions grouped into 12 categories. The second survey asked a different pool of software engineers to rate these 145 questions and identify the most important ones to work on first. Respondents favored questions that focus on how customers typically use their applications. We also saw opposition to questions that assess the performance of individual employees or compare them with one another. Our categorization and catalog of 145 questions can help researchers, practitioners, and educators to more easily focus their efforts on topics that are important to the software industry.
The data appendix for this paper is here: http://research.microsoft.com/apps/pubs/?id=200784.
- Andrew Begel and Thomas Zimmermann, Analyze This! 145 Questions for Data Scientists in Software Engineering, no. MSR-TR-2013-111, 28 October 2013.
In this paper, we present the results from two surveys related to data science applied to software engineering. The first survey solicited questions that software engineers would like to ask data scientists to investigate about software, software processes and practices, and about software engineers. Our analysis resulted in a list of 145 questions grouped into 12 categories. The second survey asked a different pool of software engineers to rate the 145 questions and identify the most important ones to work on first. Respondents favored questions that focus on how customers typically use their applications. We also see opposition to questions that assess the performance of individual employees or compare them to one another. Our categorization and catalog of 145 questions will help researchers, practitioners, and educators to more easily focus their efforts on topics that are important to the software industry.
This technical report has been published at the ICSE 2014 conference. For the definitive version, please refer to the version published in the conference proceedings: http://research.microsoft.com/apps/pubs/default.aspx?id=208800
The data appendix for this paper is here: http://research.microsoft.com/apps/pubs/?id=200784.
- Brendan Murphy, Christian Bird, Thomas Zimmermann, Laurie Williams, Nachiappan Nagappan, and Andrew Begel, Have Agile Techniques been the Silver Bullet for Software Development at Microsoft?, ACM ESEM , 11 October 2013.
Background. The pressure to release high-quality, valuable software products at an increasingly faster rate is forcing software development organizations to adapt their development practices. Agile techniques began emerging in the mid-1990s in response to this pressure and to increased volatility of customer requirements and technical change. Theoretically, agile techniques seem to be the silver bullet for responding to these pressures on the software industry. Aims. This paper tracks the changing attitudes to agile adoption and techniques, within Microsoft, in one of the largest longitudinal surveys of its kind (2006-2012). Method. We collected the opinions of 1,969 agile and non-agile practitioners in five surveys over a six-year period. Results. The survey results reveal that despite intense market pressure, the growth of agile adoption at Microsoft is slower than would be expected. Additionally, no individual agile practice exhibited strong growth trends. We also found that while development practices of teams may be similar, some perceive and declare themselves to be following an agile methodology while others do not. Both agile and non-agile practitioners agree on the relative benefits and problem areas of agile techniques. Conclusions. We found no clear trends in practice adoption. Non-agile practitioners are less enamored of the benefits and more strongly in agreement with the problem areas. The ability for agile practices to be used by large-scale teams generally concerned all respondents, which may limit its future adoption.
- Andrew Begel and Thomas Zimmermann, Appendix to Analyze This! 145 Questions for Data Scientists in Software Engineering, no. MSR-TR-2013-84, 14 September 2013.
In order to understand the questions that software engineers would like to ask data scientists about software, the software process, and about software engineering practices, we conducted two surveys: the first survey solicited questions and the second survey ranked a set of questions. Our analysis resulted in a catalog of 145 questions grouped into 12 categories as well as a ranking of the importance of each question. This technical report contains the survey text as well as the complete list of 145 questions with rankings.
This data is used by a publication at the ICSE 2014 conference. To read the paper, please refer to the version published in the conference proceedings: http://research.microsoft.com/apps/pubs/default.aspx?id=208800
- Anja Guzzi and Andrew Begel, Faciliting Communication between Engineers with CARES, in Proceedings of the International Conference on Software Engineering, IEEE, 6 June 2012.
When software developers need to exchange information or coordinate work with colleagues on other teams, they are often faced with the challenge of finding the right person to communicate with. In this paper, we present our tool, called CARES (Colleagues and Relevant Engineers' Support), which is an integrated development environment-based (IDE) tool that enables engineers to easily discover and communicate with the people who have contributed to the source code. CARES has been deployed to 30 professional developers, and we interviewed 8 of them after 3 weeks of evaluation. They reported that CARES helped them to more quickly find, choose, and initiate contact with the most relevant and expedient person who could address their needs.
- Andrew Begel and Libby Hemphill, Not Seen and Not Heard, no. MSR-TR-2011-136, 25 April 2011.
Virtual teams, in which the members work from multiple locations, have become a common feature at many global organizations. In spite of this new reality, collocated teams experience difficulties in adapting their established processes and practices for a newly virtual working environment, greatly impeding their performance, productivity, and morale. In this paper, we present findings from a qualitative case study of five software teams that hired and onboarded their first remote team member. Our analyses focus on three underappreciated aspects of the virtual onboarding process: trying to learn team practices as the team changes them, building and maintaining social relationships with physically remote teammates, and evaluating and managing expectations of performance from afar. From the results of our analyses, we pose seven propositions about virtual onboarding that should be explored in future studies.
- Andrew Begel, Rob DeLine, and Thomas Zimmermann, Social Media for Software Engineering, in Proceedings of the FSE/SDP Workshop on the Future of Software Engineering Research (FoSER), Association for Computing Machinery, Inc., November 2010.
Social media has changed the way that people collaborate and share information. In this paper, we highlight its impact for enabling new ways for software teams to form and work together. Individuals will self-organize within and across organizational boundaries. Grassroots software development communities will emerge centered around new technologies, common processes and attractive target markets. Companies consisting of lone individuals will able to leverage social media to conceive of, design, develop, and deploy successful and profitable product lines. A challenge for researchers who are interested in studying, influencing, and supporting this shift in software teaming is to make sure that their research methods protect the privacy and reputation of their stakeholders.
- Andrew Begel, Khoo Yit Phang, and Thomas Zimmermann, WhoseIsThat: Finding Software Engineers with Codebook (Research Demo), in Proceedings of the 16th International Symposium on Foundations of Software Engineering (FSE), Association for Computing Machinery, Inc., November 2010.
In this demo, we describe WhoseIsThat, a social search portal which we built using the Codebook framework. We improve the search experience in two ways: first, we search across multiple software repositories at once with a single query; second, we return not just a list of artifacts in the results, but also engineers.
- Andrew Begel and Thomas Zimmermann, Keeping up with your Friends: Function Foo, Library Bar.DLL, and Work Item 24, in Proceedings of Web2SE: First Workshop on Web 2.0 for Software Engineering, Association for Computing Machinery, Inc., 4 May 2010.
Development teams who work with others need to be aware of what everyone is doing in order to manage the risk of taking on dependencies. Using newsfeeds of software development activities mined from software repositories, teams can find relevant information to help them make well-informed decisions that affect the success of their endeavors. In this paper, we describe the architecture of a newsfeed system that we are currently building on top of the Codebook software repository mining platform. We discuss the design, construction and aggregation of newsfeeds, and include other important aspects such as summarization, filtering, context, and privacy.
- Andrew Begel, Khoo Yit Phang, and Thomas Zimmermann, Codebook: Discovering and Exploiting Relationships in Software Repositories, in Proceedings of the ACM/IEEE 32nd International Conference on Software Engineering, Association for Computing Machinery, Inc., 2 May 2010.
Large-scale software engineering requires communication and collaboration to successfully build and ship products. We conducted a survey with Microsoft engineers on inter-team coordination and found that the most impactful problems concerned finding and keeping track of other engineers. Since engineers are connected by their shared work, a tool that discovers connections in their work-related repositories can help. Here we describe the Codebook framework for mining software repositories. It is flexible enough to address all of the problems identified by our survey with a single data structure (graph of people and artifacts) and a single algorithm (regular language reachability). Codebook handles a larger variety of problems than prior work, analyzes more kinds of work artifacts, and can be customized by and for end-users. To evaluate our framework’s flexibility, we built two applications, Hoozizat and Deep Intellisense. We evaluated these applications with engineers to show effectiveness in addressing multiple inter-team coordination problems.
- Andrew Begel and Nachiappan Nagappan, Coordination in Large-Scale Software Development: Helpful and Unhelpful Behaviors, no. MSR-TR-2009-135, 28 September 2009.
Large-scale software development requires coordination within and between very large engineering teams which may be located in different buildings, on different company campuses, and in different time zones. At Microsoft Corporation, we studied a 3-year-old, 300-person software application team based in Redmond, WA to learn how they coordinate with three intra-organization, physically distributed dependencies: a platform library team also in Redmond; a team three time zones away in Boston, MA; and a team in Hyderabad, India. Thirty-one interviews with 26 team members revealed that coordination was most impacted by issues of communication, capacity and cooperation. Distributed teams faced additional challenges due to time zone and cultural differences between the team members. We support our findings with a survey of 775 engineers across Microsoft who described their experiences managing coordination in their own software products. We suggest new processes and tools to improve team coordination.
- Andrew Begel and Robert DeLine, Codebook: Social Networking over Code, in Proceedings of ICSE 09 (New Ideas and Emerging Results), Association for Computing Machinery, Inc., June 2009.
Social networking systems help people maintain connections to their friends, enabling awareness, communication, and collaboration, especially at a distance. In many studies of coordination in software engineering, the work artifacts, e.g. code, bugs, specifications, are themselves the objects that link engineers together. In this paper, we introduce Codebook, a social networking web service in which people can be “friends” not only with other people but with the work artifacts they share with them. Providing a web interface to the graph of these connections will enable software engineers to keep track of task dependencies, discover and maintain connections to other teams, and understand the history and rationale behind the code that they work on and use.
- Andrew Begel, Nachiappan Nagappan, Christopher Poile, and Lucas Layman, Coordination in large-scale software teams, in Proceedings of the 2009 ICSE Workshop on Cooperative and Human Aspects on Software Engineering (CHASE), IEEE Computer Society, Washington, DC, USA, May 2009.
Large-scale software development requires coordination within and between very large engineering teams which may be located in different buildings, on different company campuses, and in different time zones. From a survey answered by 775 Microsoft software engineers, we learned how work was coordinated within and between teams and how engineers felt about their success at these tasks. The respondents revealed that the most common objects of coordination are schedules and features, not code or interfaces, and that more communication and personal contact worked better to make interactions between teams go more smoothly.
- Andrew Begel and Nachiappan Nagappan, Pair programming: what's in it for me?, in ESEM '08: Proceedings of the Second ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ACM, New York, NY, USA, October 2008.
- Andrew Begel and Beth Simon, Novice Software Developers, All Over Again, in ICER '08: Proceedings of the Fourth International Workshop on Computing Education Research, ACM, New York, NY, USA, September 2008.
Transitions from novice to expert often cause stress and anxiety and require specialized instruction and support to enact efficiently. While many studies have looked at novice computer science students, very little research has been conducted on professional novices. We conducted a two-month in-situ qualitative case study of new software developers in their first six months working at Microsoft. We shadowed them in all aspects of their jobs: coding, debugging, designing, and engaging with their team, and analyzed the types of tasks in which they engage. We can explain many of the behaviors revealed by our analyses if viewed through the lens of newcomer socialization from the field of organizational man-agement. This new perspective also enables us to better understand how current computer science pedagogy prepares students for jobs in the software industry. We consider the implications of this data and analysis for developing new processes for learning in both university and industrial settings to help accelerate the transi-tion from novice to expert software developer.
- Andrew Begel and Nachiappan Nagappan, Global Software Development: Who Does It?, in International Conference on Global Software Engineering, IEEE Computer Society, August 2008.
In today’s world, software development is increa-singly spread across national and geographic boundaries. There is limited empirical evidence about the number and distribution of people in a large software company who have to deal with global software devel-opment (GSD). Is GSD restricted to a select few in a company? How many time zones do engineers have to deal with? Do managers have to deal with GSD more than individual engineers? What are the benefits and problems that engineers see with GSD? How have they tried to improve GSD coordination? These are interesting questions to be addressed in an empirical con-text. In this paper, we report on the results of a large-scale survey of software engineers at Microsoft Corporation. We found that a very high proportion of engi-neers are directly involved with GSD. In addition, more than 50% of the respondents regularly collaborate with people more than three time zones away. Engineers also report that communication difficulties around coordination are the most critical, yet difficult to solve issues with GSD.
- Reid Holmes and Andrew Begel, Deep Intellisense: A Tool for Rehydrating Evaporated Information, in Proceedings of the 2008 International Working Conference on Mining Software Repositories, ACM, New York, NY, USA, May 2008.
- Lucas Layman, Nachiappan Nagappan, Sam Guckenheimer, Jeff Beehler, and Andrew Begel, Mining Software Effort Data: Preliminary Analysis of Visual Studio Team System Data, in Proceedings of the 2008 International Working Conference on Mining Software Repositories, ACM, New York, NY, USA, May 2008.
In the software development process, scheduling and predictability are important components to delivering a product on time and within budget. Effort estimation artifacts offer a rich data set for improving scheduling accuracy and understanding the develop-ment process. Effort estimation data for 55 features in the latest release of Visual Studio Team System (VSTS) were collected and analyzed for trends, patterns, and differences. Statistical analysis shows that actual estimation error was positively correlated with feature size, and that in-process metrics of estimation error were also correlated with the final estimation error. These findings suggest that smaller features can be estimated more accurately, and that in-process estimation error metrics can be provide a quantitative supplement to developer intuition regarding high-risk features during the development process.
- Andrew Begel and Beth Simon, Struggles of new college graduates in their first software development job, in Proceedings of the 39th Technical Symposium on Computer Science Education, ACM, New York, NY, USA, March 2008.
How do new college graduates experience their first software development jobs? In what ways are they prepared by their educational experiences, and in what ways do they struggle to be productive in their new positions? We report on a "fly-on-the-wall" observational study of eight recent college graduates in their first six months of a software development position at Microsoft Corporation. After a total of 85 hours of on-the-job observation, we report on the common abilities evidenced by new software developers including how to program, how to write design specifications, and evidence of persistence strategies for problem-solving. We also classify some of the common ways new software developers were observed getting stuck: communication, collaboration, technical, cognition, and orientation. We report on some common misconceptions of new developers which often frustrate them and hinder them in their jobs, and conclude with recommendations to align Computer Science curricula with the observed needs of new professional developers.
- Andrew Begel, Effecting Change: Coordination in Large-scale Software Development, in Proceedings of the 2008 International Workshop on Cooperative and Human Aspects of Software Engineering, ACM, New York, NY, USA, 2008.
- Andrew Begel and Nachiappan Nagappan, Usage and Perceptions of Agile Software Development in an Industrial Context: An Exploratory Study, in First International Symposium on Empirical Software Engineering and Metrics, IEEE Computer Society, September 2007.
Agile development methodologies have been gaining acceptance in the mainstream software development community. While there are numerous studies of Agile development in academic and educational settings, there has been little detailed reporting of the usage, penetration and success of Agile methodologies in traditional, professional software development organizations. We report on the results of an empirical study conducted at Microsoft to learn about Agile development and its perception by people in development, testing, and management. We found that one-third of the study respondents use Agile methodologies to varying degrees, and most view it favorably due to improved communication between team members, quick releases and the increased flexibility of Agile designs. The Scrum variant of Agile methodologies is by far the most popular at Microsoft. Our findings also indicate that developers are most worried about scaling Agile to larger projects (greater than twenty members), attending too many meetings and th
- Andrew Begel, Help, I Need Somebody!, in Proceedings of the CSCW Workshop: Supporting the Social Side of Large-Scale Software Development, Association for Computing Machinery, Inc., November 2006.
Information discovery is a very difficult and frustrating aspect of software development. Novice developers are often assigned a mentor who preemptively provides answers and advice without requiring the novice to explicitly ask for help. A similar situation occurs among expert developers in radically collocated settings. The close proximity enhances communication between all members of a group, providing needed information, often preemptively due to ambient awareness of other developers. In this paper, we propose a mechanism to extend this desirable property of preemptive mentoring to developers in more traditional software engineering environments. The proposed system will infer when and how a developer becomes blocked looking for information, and notify an appropriate expert to come to his aid. We believe that this preemptive help will lower developer frustration and enhance diffusion of expert knowledge throughout an organization.
- Andrew Begel and Susan L. Graham, An Assessment of a Speech-Based Programming Environment, in Proceedings of the Symposium on Visual Languages and Human-Centric Computing, IEEE Computer Society, Washington, DC, USA, September 2006.
Programmers who suffer from repetitive stress injuries find it difficult to program by typing. Speech interfaces can reduce the amount of typing, but existing programming-by-voice tools make it awkward for programmers to enter and edit program text. We used a human-centric approach to address these problems. We first studied how programmers verbalize code, and found that spoken programs contain lexical, syntactic and semantic ambiguities that do not appear in written programs. Using the results from this study, we designed Spoken Java, a syntactically similar, yet semantically identical variant of Java that is easier to speak. We built an Eclipse IDE plugin called SPEED (for SPEech EDitor) to support the combination of Spoken Java and an associated command language. In this paper, we report the results of the first study ever of any working programming-by-voice system. Our evaluation with expert Java developers showed that most developers had little trouble learning to use the system via spoken commands, but were reluctant to speak literal code out loud. As expected, programmers found programming by voice to be slower than typing.