Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Rob DeLine

Rob DeLine
PRINCIPAL RESEARCHER
.

To make a prairie it takes a clover and one bee,—
One clover, and a bee,
And revery.
The revery alone will do
If bees are few.
— Emily Dickinson, 1830-1886

 

Over the years, my research projects have spanned user interfaces, software engineering and type theory, but they all share a common goal: to make it easier to produce usable, reliable software. When you observe the work practice of an experienced professional, like a surgeon or a car mechanic, you see efficient, graceful use of task-appropriate tools. In contrast, if you watch an experienced software developer doing an every-day task, you see fumbling, confusion and frustration. Software developers are every bit as trained and talented, but their tools and processes are often poorly suited for their tasks.

My group at Microsoft Research, Human Interactions in Programming (HIP), applies user-centered design to software development: studying developers both in the lab and in the field; understanding what is difficult about their typical tasks; building new tools to make those tasks easier; and evaluating those tools with developers. My recent research studies recommender systems for team newcomers, the use of spatial memory to navigate large code bases, retaining knowledge in long-lived projects, and patterns of communication and interruption in co-located and geographically distributed development teams.

Curriculum Vitae (including complete publication list)
Facebook page

Recent talks
Recent publications
  • Leilani Battle, Danyel Fisher, Robert DeLine, Mike Barnett, Badrish Chandramouli, and Jonatahn Goldstein, Making Sense of Temporal Queries with Interactive Visualization, in Proceedings of the 34th Conference on Human-Computer Interaction (CHI 2016), ACM – Association for Computing Machinery, 7 May 2016.

    As real-time monitoring and analysis become increasingly important, researchers and developers turn to data stream management systems (DSMS’s) for fast, efficient ways to pose temporal queries over their datasets. However, these systems are inherently complex, and even database experts find it difficult to understand the behavior of DSMS queries. To help analysts better understand these temporal queries, we developed StreamTrace, an interactive visualization tool that breaks down how a temporal query processes a given dataset, step-by-step. The design of StreamTrace is based on input from expert DSMS users; we evaluated the system with a lab study of programmers who were new to streaming queries. Results from the study demonstrate that StreamTrace can help users to verify that queries behave as expected and to isolate the regions of a query that may be causing unexpected results.

  • Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel, The Emerging Role of Data Scientists on Software Development Teams, in Proceedings of the 38th International Conference on Software Engineering (ICSE 2016), ACM – Association for Computing Machinery, May 2016.

    Creating and running software produces large amounts of raw data about the development process and the customer usage, which can be turned into actionable insight with the help of skilled data scientists. Unfortunately, data scientists with the analytical and software engineering skills to analyze these large data sets have been hard to come by; only recently have software companies started to develop competencies in software-oriented data analytics. To understand this emerging role, we interviewed data scientists across several product groups at Microsoft. In this paper, we describe their education and training background, their missions in software engineering contexts, and the type of problems on which they work. We identify five distinct working styles of data scientists: (1) Insight Providers, who work with engineers to collect the data needed to inform decisions that managers make; (2) Modeling Specialists, who use their machine learning expertise to build predictive models; (3) Platform Builders, who create data platforms, balancing both engineering and data analysis concerns; (4) Polymaths, who do all data science activities themselves; and (5) Team Leaders, who run teams of data scientists and spread best practices. We further describe a set of strategies that they employ to increase the impact and actionability of their work.

  • Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel, Appendix to The Emerging Role of Data Scientists on Software Development Teams, no. MSR-TR-2016-4, 10 February 2016.

    This technical report contains supplemental materials (hierarchical taxonomy of codes, selected quotes) to the paper "The Emerging Role of Data Scientists on Software Development Teams" published at ICSE 2016.

  • Titus Barik, Robert DeLine, Steven Drucker, and Danyel Fisher, The Bones of the System: A Study of Logging and Telemetry at Microsoft, no. MSR-TR-2015-79, 26 October 2015.

    Large software organizations are transitioning to event data platforms as they culturally shift to better support data-driven decision making. This paper offers a case study at Microsoft during such a transition. Through qualitative interviews of 28 participants, and a quantitative survey of
    1,823 respondents, we catalog a diverse set of activities that leverage event data sources, identify challenges in conducting these activities, and describe tensions that emerge in data-driven cultures as event data flow through these activities within the organization. We find that the use of event data span every job role in our interviews and survey, that different perspectives on event data create tensions between roles or teams, and that professionals report social and technical challenges across activities.

  • Badrish Chandramouli, Jonathan Goldstein, Mike Barnett, Robert DeLine, Danyel Fisher, John C. Platt, James F. Terwilliger, and John Wernsing, Trill: A High-Performance Incremental Query Processor for Diverse Analytics, VLDB – Very Large Data Bases, August 2015.

    This paper introduces Trill – a new query processor for analytics. Trill fulfills a combination of three requirements for a query processor to serve the diverse big data analytics space: (1) Query Model: Trill is based on a tempo-relational model that enables it to handle streaming and relational queries with early results, across the latency spectrum from real-time to offline; (2) Fabric and Language Integration : Trill is architected as a high-level language library that supports rich data-types and user libraries, and integrates well with existing distribution fabrics and applications; and (3) Performance: Trill’s throughput is high across the latency spectrum. For streaming data, Trill’s throughput is 2-4 orders of magnitude higher than comparable streaming engines. For offline relational queries, Trill’s throughput is comparable to a major modern commercial columnar DBMS. Trill uses a streaming batched-columnar data representation with a new dynamic compilation-based system architecture that addresses all these requirements. In this paper, we describe Trill’s new design and architecture, and report experimental results that demonstrate Trill’s high performance across diverse analytics scenarios. We also describe how Trill’s ability to support diverse analytics has resulted in its adoption across many usage scenarios at Microsoft.

  • Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel, The Emerging Role of Data Scientists on Software Development Teams, no. MSR-TR-2015-30, 12 April 2015.

    Creating and running software produces large amounts of raw data about the development process and the customer usage, which can be turned into actionable insight with the help of skilled data scientists. Unfortunately, data scientists with the analytical and software engineering skills to analyze these large data sets have been hard to come by; only recently have software companies started to develop competencies in software-oriented data analytics. To understand this emerging role, we interviewed data scientists across several product groups at Microsoft. In this paper, we describe their education and training background, their raison d’être in software engineering contexts, and the type of problems on which they work. We identify five distinct working styles of data scientists and describe a set of strategies that they employ to increase the impact and actionability of their work.

  • Emanuel Zgraggen, Steven M. Drucker, Danyel Fisher, and Robert DeLine, (s|qu)eries: Visual Regular Expressions for Querying and Exploring Event Sequences, ACM – Association for Computing Machinery, April 2015.

    Many different domains collect event sequence data and rely on finding and analyzing patterns within it to gain meaningful insights. Current systems that support such queries either provide limited expressiveness, hinder exploratory workflows or present interaction and visualization models which do not scale well to large and multi-faceted data sets. In this paper we present (s|qu)eries (pronounced “Squeries”), a visual query interface for creating queries on sequences (series) of data, based on regular expressions. (s jqu)eries is a touchbased system that exposes the full expressive power of regular expressions in an approachable way and interleaves query specification with result visualizations. Being able to visually investigate the results of different query-parts supports debugging and encourages iterative query-building as well as exploratory work-flows. We validate our design and implementation through a set of informal interviews with data scientists that analyze event sequences on a daily basis.

  • Danyel Fisher, Badrish Chandramouli, Robert DeLine, Jonathan Goldstein, Andrei Aron, Mike Barnett, John C. Platt, James F. Terwilliger, and John Wernsing, Tempe: An Interactive Data Science Environment for Exploration of Temporal and Streaming Data, no. MSR-TR-2014-148, November 2014.

    Over the last two decades, data scientists performed increasingly sophisticated analyses on larger data sets, yet their tools and workflows remain low-level. A typical analysis involves different tools for different stages of the work, requiring file transfers and considerable care to keep everything organized. Temporal data adds additional complexity: users typically must write queries offline before porting them to production systems. To address these problems, this paper introduces Tempe, a web application providing an integrated, collaborative environment for both real-time and offline temporal data analysis. Tempe's central concept is a persistent research notebook retaining data sources, analysis steps and results. Analysis steps are carried out in script editor that uses a live programming approach to display interactive, progressively updated visualizations. Tempe uses a temporal streaming engine, Trill [17], as its backend data processor. In the process of creating Tempe, we have discovered new interactivity and responsiveness requirements for Trill. Conversely, building around Trill has shaped the user experience for Tempe. We report on this cross-disciplinary design process to argue that end user experience can be an integral part of creating a data engine.

  • Badrish Chandramouli, Jonathan Goldstein, Mike Barnett, Robert DeLine, Danyel Fisher, John C. Platt, James F. Terwilliger, and John Wernsing, The Trill Incremental Analytics Engine, no. MSR-TR-2014-54, April 2014.

    This technical report introduces Trill – a new query processor for analytics. Trill fulfills a combination of three requirements for a query processor to serve the diverse big data analytics space: (1) Query Model: Trill is based on a tempo-relational model that enables it to handle streaming and relational queries with early results, across the latency spectrum from real-time to offline; (2) Fabric and Language Integration: Trill is architected as a high-level language library that supports rich data-types and user libraries, and integrates well with existing distribution fabrics and applications; and (3) Performance: Trill’s throughput is high across the latency spectrum. For streaming data, Trill’s throughput is 2-4 orders of magnitude higher than today’s comparable streaming engines. For offline relational queries, Trill’s throughput is comparable to a major modern commercial columnar DBMS.

    Trill uses a streaming batched-columnar data representation with a new dynamic compilation-based system architecture that addresses all these requirements. In this technical report, we describe Trill’s new design and architecture, and report experimental results that demonstrate Trill’s high performance across diverse analytics scenarios. We also describe how Trill’s ability to support diverse analytics has resulted in its adoption across many usage scenarios at Microsoft.

  • Mike Barnett, Robert DeLine, Akash Lal, and Shaz Qadeer, Get Me Here: Using Verification Tools to Answer Developer Questions, no. MSR-TR-2014-10, February 2014.

    While working developers often struggle to answer reachability questions (e.g. How can execution reach this line of code? How can execution get into this state?), the research community has created analysis and verification technologies whose purpose is systematic exploration of program execution. In this paper, we show the feasibility of using verification tools to create a query engine that automatically answers certain kinds of reachability questions. For a simple query, a developer invokes the “Get Me Here" command on a line of code. Our tool uses an SMT-based static analysis to search for an execution that reaches that line of code. If the line is reachable, the tool visualizes the trace using a Code Bubbles representation to show the methods invoked, the lines executed within the methods and the values of variables. The GetMeHere tool also supports more complex queries where the user specifies a start point, intermediate points, and an end point, each of which can specify a predicate over the program's state at that point. We evaluate the tool on a set of three benchmark programs. We compare the performance of the tool with professional developers answering the same reachability questions. We conclude that the tool has sufficient accuracy, robustness and performance for future testing with professional users.

Microsoft Corporation
One Microsoft Way
Redmond, WA 98052
USA

Office 1-425-705-4972
Fax 1-425-936-7329