To make a prairie it takes a clover and one bee,—
One clover, and a bee,
The revery alone will do
If bees are few.
— Emily Dickinson, 1830-1886
Over the years, my research projects have spanned user interfaces, software engineering and type theory, but they all share a common goal: to make it easier to produce usable, reliable software. When you observe the work practice of an experienced professional, like a surgeon or a car mechanic, you see efficient, graceful use of task-appropriate tools. In contrast, if you watch an experienced software developer doing an every-day task, you see fumbling, confusion and frustration. Software developers are every bit as trained and talented, but their tools and processes are often poorly suited for their tasks.
My group at Microsoft Research, Human Interactions in Programming (HIP), applies user-centered design to software development: studying developers both in the lab and in the field; understanding what is difficult about their typical tasks; building new tools to make those tasks easier; and evaluating those tools with developers. My recent research studies recommender systems for team newcomers, the use of spatial memory to navigate large code bases, retaining knowledge in long-lived projects, and patterns of communication and interruption in co-located and geographically distributed development teams.
- Robert DeLine, Making CHASE Mainstream (Keynote at CHASE Workshop), 17 May 2009.
- Badrish Chandramouli, Jonathan Goldstein, Mike Barnett, Robert DeLine, Danyel Fisher, John C. Platt, James F. Terwilliger, and John Wernsing, Trill: A High-Performance Incremental Query Processor for Diverse Analytics, VLDB – Very Large Data Bases, August 2015.
This paper introduces Trill – a new query processor for analytics. Trill fulfills a combination of three requirements for a query processor to serve the diverse big data analytics space: (1) Query Model: Trill is based on a tempo-relational model that enables it to handle streaming and relational queries with early results, across the latency spectrum from real-time to offline; (2) Fabric and Language Integration : Trill is architected as a high-level language library that supports rich data-types and user libraries, and integrates well with existing distribution fabrics and applications; and (3) Performance: Trill’s throughput is high across the latency spectrum. For streaming data, Trill’s throughput is 2-4 orders of magnitude higher than comparable streaming engines. For offline relational queries, Trill’s throughput is comparable to a major modern commercial columnar DBMS. Trill uses a streaming batched-columnar data representation with a new dynamic compilation-based system architecture that addresses all these requirements. In this paper, we describe Trill’s new design and architecture, and report experimental results that demonstrate Trill’s high performance across diverse analytics scenarios. We also describe how Trill’s ability to support diverse analytics has resulted in its adoption across many usage scenarios at Microsoft.
- Danyel Fisher, Badrish Chandramouli, Robert DeLine, Jonathan Goldstein, Andrei Aron, Mike Barnett, John C. Platt, James F. Terwilliger, John Wernsing, danyelf badrishc, and rdeline jongold, Tempe: An Interactive Data Science Environment for Exploration of Temporal and Streaming Data, no. MSR-TR-2014-148, November 2014.
Over the last two decades, data scientists performed increasingly sophisticated analyses on larger data sets, yet their tools and workflows remain low-level. A typical analysis involves different tools for different stages of the work, requiring file transfers and considerable care to keep everything organized. Temporal data adds additional complexity: users typically must write queries offline before porting them to production systems. To address these problems, this paper introduces Tempe, a web application providing an integrated, collaborative environment for both real-time and offline temporal data analysis. Tempe's central concept is a persistent research notebook retaining data sources, analysis steps and results. Analysis steps are carried out in script editor that uses a live programming approach to display interactive, progressively updated visualizations. Tempe uses a temporal streaming engine, Trill , as its backend data processor. In the process of creating Tempe, we have discovered new interactivity and responsiveness requirements for Trill. Conversely, building around Trill has shaped the user experience for Tempe. We report on this cross-disciplinary design process to argue that end user experience can be an integral part of creating a data engine.
- Badrish Chandramouli, Jonathan Goldstein, Mike Barnett, Robert DeLine, Danyel Fisher, John C. Platt, James F. Terwilliger, and John Wernsing, The Trill Incremental Analytics Engine, no. MSR-TR-2014-54, April 2014.
This technical report introduces Trill – a new query processor for analytics. Trill fulfills a combination of three requirements for a query processor to serve the diverse big data analytics space: (1) Query Model: Trill is based on a tempo-relational model that enables it to handle streaming and relational queries with early results, across the latency spectrum from real-time to offline; (2) Fabric and Language Integration: Trill is architected as a high-level language library that supports rich data-types and user libraries, and integrates well with existing distribution fabrics and applications; and (3) Performance: Trill’s throughput is high across the latency spectrum. For streaming data, Trill’s throughput is 2-4 orders of magnitude higher than today’s comparable streaming engines. For offline relational queries, Trill’s throughput is comparable to a major modern commercial columnar DBMS.
Trill uses a streaming batched-columnar data representation with a new dynamic compilation-based system architecture that addresses all these requirements. In this technical report, we describe Trill’s new design and architecture, and report experimental results that demonstrate Trill’s high performance across diverse analytics scenarios. We also describe how Trill’s ability to support diverse analytics has resulted in its adoption across many usage scenarios at Microsoft.
- Mike Barnett, Robert DeLine, Akash Lal, and Shaz Qadeer, Get Me Here: Using Verification Tools to Answer Developer Questions, no. MSR-TR-2014-10, February 2014.
While working developers often struggle to answer reachability questions (e.g. How can execution reach this line of code? How can execution get into this state?), the research community has created analysis and verification technologies whose purpose is systematic exploration of program execution. In this paper, we show the feasibility of using verification tools to create a query engine that automatically answers certain kinds of reachability questions. For a simple query, a developer invokes the “Get Me Here" command on a line of code. Our tool uses an SMT-based static analysis to search for an execution that reaches that line of code. If the line is reachable, the tool visualizes the trace using a Code Bubbles representation to show the methods invoked, the lines executed within the methods and the values of variables. The GetMeHere tool also supports more complex queries where the user specifies a start point, intermediate points, and an end point, each of which can specify a predicate over the program's state at that point. We evaluate the tool on a set of three benchmark programs. We compare the performance of the tool with professional developers answering the same reachability questions. We conclude that the tool has sufficient accuracy, robustness and performance for future testing with professional users.
- Mike Barnett, Badrish Chandramouli, Robert DeLine, Steven Drucker, Danyel Fisher, Jonathan Goldstein, Patrick Morrison, and John Platt, Stat! - An Interactive Analytics Environment for Big Data, in ACM SIGMOD International Conference on Management of Data (SIGMOD 2013), ACM SIGMOD, June 2013.
Exploratory analysis on big data requires us to rethink data management across the entire stack – from the underlying data processing techniques to the user experience. We demonstrate Stat! – a visualization and analytics environment that allows users to rapidly experiment with exploratory queries over big data. Data scientists can use Stat! to quickly refine to the correct query, while getting immediate feedback after processing a fraction of the data. Stat! can work with multiple processing engines in the backend; in this demo, we use Stat! with the Microsoft StreamInsight streaming engine. StreamInsight is used to generate incremental early results to queries and refine these results as more data is processed. Stat! allows data scientists to explore data, dynamically compose multiple queries to generate streams of partial results, and display partial results in both textual and visual form.
- Kael Rowan, Robert DeLine, Andrew Bragdon, and Jens Jacobsen, Debugger Canvas: Industrial Experience with the Code Bubbles Paradigm, International Conference on Software Engineering, 2 June 2012.
At ICSE 2010, the Code Bubbles team from Brown University and the Code Canvas team from Microsoft Research presented similar ideas for new user experiences for an integrated development environment. Since then, the two teams formed a collaboration, along with the Microsoft Visual Studio team, to release Debugger Canvas, an industrial version of the Code Bubbles paradigm. With Debugger Canvas, a programmer debugs her code as a collection of code bubbles, annotated with call paths and variable values, on a twodimensional pan-and-zoom surface. In this experience report, we describe new user interface ideas, describe the rationale behind our design choices, evaluate the performance overhead of the new design, and provide user feedback based on lab participants, post-release usage data, and a user survey and interviews. We conclude that the code bubbles paradigm does scale to existing customer code bases, is best implemented as a mode in the existing user experience rather than a replacement, and is most useful when the user has a long or complex call paths, a large or unfamiliar code base, or complex control patterns, like factories or dynamic linking.
- Danyel Fisher, Rob DeLine, Mary Czerwinski, and Steven Drucker, Interactions with Big Data Analytics, in ACM Interactions, ACM, May 2012.
- Andrew Bragdon, Robert DeLine, Ken Hinckley, and Meredith Ringel Morris, Code Space: Combining Touch, Devices, and Skeletal Tracking to Support Developer Meetings, in Proceedings of ITS 2011, ACM, November 2011.
We present Code Space, a system that contributes touch + air gesture hybrid interactions to support co-located, small group developer meetings by democratizing access, control, and sharing of information across multiple personal devices and public displays. Our system uses a combination of a shared multi-touch screen, mobile touch devices, and Microsoft Kinect sensors. We describe cross-device interactions, which use a combination of in-air pointing for social disclosure of com-mands, targeting and mode setting, combined with touch for command execution and precise gestures. In a formative study, professional developers were positive about the interaction design, and most felt that pointing with hands or devices and forming hand postures are socially acceptable. Users also felt that the techniques adequately disclosed who was interacting and that existing social protocols would help to dictate most permissions, but also felt that our lightweight permission fea-ture helped presenters manage incoming content.
- Andrew Begel, Rob DeLine, and Thomas Zimmermann, Social Media for Software Engineering, in Proceedings of the FSE/SDP Workshop on the Future of Software Engineering Research (FoSER), Association for Computing Machinery, Inc., November 2010.
Social media has changed the way that people collaborate and share information. In this paper, we highlight its impact for enabling new ways for software teams to form and work together. Individuals will self-organize within and across organizational boundaries. Grassroots software development communities will emerge centered around new technologies, common processes and attractive target markets. Companies consisting of lone individuals will able to leverage social media to conceive of, design, develop, and deploy successful and profitable product lines. A challenge for researchers who are interested in studying, influencing, and supporting this shift in software teaming is to make sure that their research methods protect the privacy and reputation of their stakeholders.
- Robert DeLine, Gina Venolia, and Kael Rowan, Software Development with Code Maps, in Communications of the ACM, vol. 53, no. 8, pp. 48-54, Association for Computing Machinery, Inc., 4 July 2010.
Could those ubiquitous hand-drawn code diagrams become a thing of the past? (NOTE: Also appears in ACM Queue 8:7, Aug 2010.)
One Microsoft Way
Redmond, WA 98052