A Live Programming Experience

By Sean McDirmid, Microsoft Research

Live programming provides live feedback about how code executes while it is being edited. Unfortunately, live programming and similar efforts are still not very well understood with many attempts limited to demo-ware, fancier LISP-like REPLs, or Smalltalk-like fix-and-continue IDEs, which while useful, lack true live feedback. This essay presents a new live programming experience called APX (A Programming eXperience, a play on Iverson's APL) that aims to overcome these challenges with a language, type system, rich code editor, and virtual time machine designed for useful live feedback. We frame our discussion of live programming design and technology challenges in a description of APX.

Let’s get right into it with a simple example that draws a circle:

The programmer first types a '?', which is an open term rendered as ‘◌'. Open terms represent unspecified symbols that leverage type inference to support code completion, scrubbing, and can even provide default values that enable early execution; we describe open terms later in a discussion of types. This open term is completed into a single argument 'draw' call where the shape argument is an open term following '#'. The shape argument is then completed into a circle with three open term arguments (center, radius, and color). These open terms all have default values and so the statement executes drawing a circle on the screen.

Next, circle's position argument is changed into a tuple whose horizontal component is edited to 100 and then edited to 200. Although APX completely supports keyboard-based live editing, this often is not very fluid; e.g. to edit "100" from "200", the '1' must first be deleted causing the circle to jumps left before jumping back right when '2' is added. This jarring behavior demonstrates how the flashy automatic execution aspect of live programming alone is not enough: live feedback must be comprehensible, useful, and easily acted on by the programmer. Chris Hancock explains this with an analogy comparing archery to hitting a target with a water hose:

Hitting a bull’s eye with an arrow is a careful process: you aim, shoot, string another arrow, carefully correct the aim, shoot again, and so on. It is difficult enough that archery is an Olympic event—a distinction unlikely to be accorded to the challenge of hitting a target with water from a hose... The archer and the waterer are engaged in qualitatively different processes, and the key to the difference is this: the waterer never stops aiming. The only movements the waterer needs to make are corrections to a single, evolving aim.

To get useful live feedback, the programmer can instead make incrementally coherent changes through mouse-based scrubbing, which is done here:

The horizontal position is scrubbed in a continuous motion where the programmer can increase and decrease the value with slights physical movements while observing correspondingly slight virtual movements of the circle. As live feedback, scrubbing is water hosing while keyboard editing is archery. However, many coding tasks involve more than just editing numbers, while sliders seem like blunt tools for changing values that have finer semantic meanings. To solve these problems, APX leverages its type information to expand where scrubbing is possible. Consider an enhancement to the example where circle color, shape, position and angle are scrubbed:

A brown color is chosen for the circle drawn in the last example. The 'brown' call is then scrubbed through different colors; as with the horizontal position of the last example, the color of the circle changes immediately as different colors are highlighted. The shape of the circle can also be scrubbed; again the shape changes as each option is highlighted. Such scrubbing on shape and color are "API scrubs" that are enabled by associating constructs with peers that can replace them; e.g. brown with red, blue, and fuchsia, or circle with square, rectangle, ellipse, and triangle.

Scrubbing can also be customized for complex and semantically rich numeric values. After the circle in the example is scrubbed into a triangle, additional scrubbing repositions the triangle on its tuple position using a vector scrubbing widget. The triangle's rotation argument is then scrubbed to a 299 degree angle using a radial scrub widget that activates because an angle type is inferred on it. We discuss type inference's interaction with scrubbing later.

Liveness in Language

Language plays a significant role in how effective live feedback is. Again quoting Hancock:

As a thought experiment, imagine taking Logo (or Java, or BASIC, or LISP) as is, and attempting to build a live programming environment around it. Some parts of the problem are hard but solvable, e.g. keeping track of which pieces of code have been successfully parsed and are ready to run, and which ones haven’t. Others just don’t make sense: what does it mean for a line of Java or C, say a = a + 1, to start working as soon as I put it in? Perhaps I could test the line right away—many programming environments allow that. But the line doesn’t make much sense in isolation, and in the context of the whole program, it probably isn’t time to run it. In a sense, even if the programming environment is live, the code itself is not.

Encoded statements that have meaning over all of time, not just at a single point of sequential execution, are more amenable to live feedback because it is "always time to run them". Examples where statements have time-continuous (or time-less) meanings include declarative rule-based, data-flow, and functional paradigms. The oldest and most popular "live" languages are based on visual data flow (connect the blocks); e.g. see Apple’s Quartz Composer. Functional-reactive programming (FRP) is built around the notion of continuously executing functions as signals, which formed the basis of my earlier live programming system.

Unfortunately, declarative programming can feel quite restrictive in doing even simple things like drawing a circle on a screen or creating a dynamically changing list of objects. More significantly, declarative paradigms are often either very live but too concrete, making them less scalable, or are too abstract, creating a huge gulf between code and evaluation that live feedback cannot easily bridge. In the former case, Quartz Composer is very live and concrete where components are wired directly together, but non-trivial programs can quickly become a huge mess of boxes and wires. In the latter case, FRP lifts function evaluation over time execution can only be influenced indirectly through higher-order function composition; code can never touch values!

APX solves the live language problem with continuous execution of imperative statements combined with event handlers that perform discrete stepped updates. For example, the 'draw' method is an imperative command that executes continuously. If the position of the shape drawn happens to change during the continuous time 'draw' executes, then what is drawn simply moves according to the new position as in a live data-flow or FRP system; consider:

In this example, the 'draw' method executes continuously on a 'circle' with a position 'p', which is initially (20,20). A handler on a built-in 'tick' event is then added to increment 'p' on every tick. As soon as that statement is completed, the circle moves. However, execution is paused and only the trajectory of the circle is observed via a strobbing effect. Scrubbing the vertical component of the vector added to 'p' changes the circle's trajectory, and scrubbing on the 'tick' event allows observation of how the circle moves over time; i.e. such scrubbing performs time travel.

APX incorporates events and time into live programming by treating them as variables to be scrubbed, making time tangible (see Bret Victor). Reasoning about program execution in real-time is often not very useful: the program can execute way too fast for live feedback to be comprehensible. Instead, APX allows for live programming over multiple steps of the program's execution (in this case 200 ticks). The programmer can then explore this execution by scrubbing a tick reference, which is done in the above example to see where the circle is at various time points. Strobing is then used to sample the circle's position at certain points in time, making the circles trajectory obvious, which is not apparent via time scrubbing alone.

The benefits of time tangibility are apparent in a more complex example; here one or more circles are influenced by gravity and bounce off the sides of the screen:

A top level for-loop draws a circle on each iteration with a position 'p' and radius 'r' for the circle it represents, which are initialized to open terms that evaluate to appropriate random values; the circle(s) is also given a random color via a call to 'randColor.' Code inside the "on tick" block make discrete stepped updates position 'p' and velocity 'v' using standard physics computations; a call to 'inflect1' creates a bounce when a circle's position leaves the bounds of the screen. As the initial velocity 'v' is scrubbed in the example, the trajectory of the circle apparent through strobing is updated automatically. This example also includes a for loop whose 'until' boundary can be scrubbed to increase or decrease the number of balls created. As with other scubs, the selection is executed immediately while scrubbing occurs. After the selection is made, scrubbing the 'v' velocity in the body of the loop will immediately provide feedback on all loop iterations and 200 ticks of execution.

Live Debugging

There is more to debugging than observing user-oriented output, which some programs might not even have. We also need "programmer-oriented" output oriented toward debugging to easily see what values are flowing through the program. As Bret Victor laments:

We expect programmers to write code that manipulates variables, without ever seeing the values of those variables. We expect readers to understand code that manipulates variables, without ever seeing the values of the variables. The entire purpose of code is to manipulate data, and we never see the data. We write with blindfolds, and we read by playing pretend with data-phantoms in our imaginations.

On the other hand, pervasively showing evaluation results can overwhelm programmers with context that prevents them from focusing on their current problems. To address this, APX provides a probe construct that allows programmers to easily see evaluation results that they are interested in; consider:

An explicit probe is first attached to the 'randColor' (random color) argument of 'draw#circle' by prepending a '◎' to the term (typed as '@'). The evaluation of 'randColor' is then displayed as live text in the gutter beneath the '◎'. Probes are enabled and observed directly in code, allowing programmers to quickly and easily use them to obtain live feedback. Probe displays can also show more than just text; e.g. this example's probe shows the actual color being rendered, allowing programmers to quickly correlate probe results with the circle being drawn on the screen.

As code can execute in many contexts where what results are shown depends on a debug focus that is set through scrubbing or navigation. The probe in this example initially shows the color for the first circle drawn ('y' = 0). The loop iterator 'y' term can be scrubbed to move the debug focus to other iterations of the for-loop. The color probe then changes according to what value is highlighted when scrubbing 'y', which can be correlated to the circles drawn.

APX also supports method navigation (ctrl-G) to establish debug focuses for methods and non-code probes (ctrl-P) to probe expression values on demand; consider:

We use on-demand to see the position value of the light blue circle using ctrl-P on the position 'p' term. The cursor is then moved to the 'inflect1' term and ctrl-G is pressed, jumping to the definition of this method. The screen size term is probed along with the two 'inflect0' calls. The second call yields a -1 since the light blue circle's Y coordinate is out of bounds, leading to a bounce that can be observed in the user output.

Probing and execution navigation allows programmers to diagnose problems once they have identified them. However, they do not provide a comprehensive view of program execution that would help them find problems in the first place. For this, constructs that trace (sample and log) program execution are more appropriate; the most simplest of these being old-fashioned calls to 'printf'. APX provides a live-version of 'printf' that works well with live feedback. The resulting 'log' construct is used to log circle position and velocity in the following example:

An explicit probe is added to the expression returned by the 'inflect1' method. A 'log' statement is then added to the body of the for loop to display each iteration's index 'y', position 'p', and velocity 'v'. Each column appears in the log view (bottom right pane) as soon as the 'log' statement is syntactically correct during editing. The log views a single point in time that can be adjusted by scrubbing on 'tick' where the log changes automatically with respect to the time highlighted.

Clicking on a log entry navigates to the 'log' call that generated that entry and sets the debug focus to that specific call's execution context. In this example, this means that the color probe on 'randColor' changes along with the probe set on the result of 'inflect1'. In this way, log entries not only provide programmers with comprehensive feedback about program execution, but also act as bookmarks into that program execution!

Type Less with Type-less

Static types offer a lot to live programming in enabling code completion, making sense of live feedback, and boosting performance (responsive live feedback does not come cheap!). However, existing type systems are not well suited to ad-hoc live programming where types, and even constructs being used, are unknown for long periods of time. Even languages that support aggressive type inference, such as Haskell, enforce a linear progression where types are first defined and then used immutably. Terms have types based on how constructs are defined, and therefore something that is unknown necessarily lacks a type.

APX includes a type system called Type-less that is based on "backward" type inference to handle programs with incomplete type information that changes incrementally. By backwards, we mean that the type inferred for a term is based on usage rather than definition: X is typed as a duck because it needs to quack as opposed to typing X as a duck because it is defined to quack. Going backwards allows for open ◌ terms that are unresolved but still have inferred types based on usage. These types can then be used for code completion, scrubbing, or to provide default values that enable earlier execution. Consider an example like the first of this essay where a drawn circle's color comes from a color provider:

On a line by itself, an open '◌' term is inferred to have a void type, where code completion on it then brings up all expressions with void types (void methods and variables that can be assigned), including 'draw' that is selected. 'Draw' has one argument completed as an open term that then undergoes code completion itself, bringing up all possible shapes. Options that do not lead directly to shapes are also presented but grayed out as possible but less likely; e.g. a shape from tuple methods like 'item2' since ATup2 is a generic type. Impossible completions, such as a void method in a non-void context, are omitted all together as their selection can only lead to the disappointment of type errors.

After 'circle' is selected, the draw statement immediately executes because all of circle's arguments, which are completed as open terms, have types that are associated with default values: a center position and radius are selected at random based on the screen's current size, while circle is given a default grayish color. With such defaults, code executes earlier so live feedback can assist programmers in completing their code.

Code completion considers both the receiving object's type (if specified) and the inferred type of the open term being completed. If a completion has a receiving object that is otherwise unaccounted for, the receiving object is completed as an open term. In the example, the open term used as the color argument is probed (so we can see its type) and then completed on. Again various colors are visible while many less likely options are grayed out. The 'useColor' variable is a candidate for use and is emphasized as it directly provides a color; using this as the completion causes a receiver open term to be generated in the editor. This receiver open term is then completed on, bringing up 'cp' as an option. Once 'cp' is selected, the circle turns red.

Almost all non-prototype languages require pre-commitment of an object's types when it is created. However, programmers often do not know exactly what they need their object to be; they just know they need an object! Objects in Type-less have definitions that are inferred based on their usage. Consider this example that draws objects from a generic set:

Three objects are created using 'new' and inserted into a set 'things'. Sets are generic with a type parameter 'ItemT' that each element is bound to. The first for-loop iterates over 'things' to call the 'animate' method defined in the 'ThingA' trait. This causes 'things.ItemT' to be bound by 'ThingA', which is then backwards propagated to the three objects inserted into the set, causing three circles to immediately appear on the screen as per the behavior of 'animate'. The second for loop again iterates over 'things' to assign the 'rectRotate' variable of each element to 30. As 'rectRotate' is defined in 'ThingB', 'this.ItemT' is now bound by that trait. The three objects then are inferred to extend 'ThingsB,' which overrides the 'animate' method behavior to include a rotated rectangle.

The idea that usage defines objects rather than object definition restricting usage allows for a more fluid live programming experience that avoids pre-commitment. APX does sacrifice member overloading to achieve this: all members exist in a single public namespace, although they can be disambiguated by known inferred types, while the IDE can request clarification from the programmer when needed. Additionally, inferring term types based on usage works against early error detection. While useful, early error detection is not the primary use case for types in live programming. Additionally, Type-less includes consistency checks to ensure that conflicting traits are not extended by the same term; e.g. using a number as a shape leads to a type error since 'Shape' and 'Number' cannot co-exist. This check covers many, but not all, of the errors that forward type checking would catch.

Type inference determines how values are scrubbed; e.g. a number used as a horizontal component is scrubbed different from an angle. This is accomplished with brands, which are traits that can be extended by constants; consider:

Plain constant numbers are used as horizontal and vertical coordinates, radiuses, color component percentages, angles, and so on. Each use is inferred with different brand traits that lead to different scrubbing behavior: radial scrubbing for rotation, percent slider for color components, and sliders for horizontal and vertical positions that begin and end according to the corresponding screen size dimension. Tuples can also be branded: in this example, color is a three component tuple with a custom "color picker" scrub while position supports a vector scrub that respects screen boundaries.

Type-less supports nominal subtyping, inferred variance, self typing, as well as F-bounded parametric polymorphism; in all cases, type annotations are completely optional for client code that does not define new methods or traits, and is greatly reduced otherwise. Unlike Haskell, type inference in Type-less is not global: inference cannot cross trait boundaries, and so local modular reasoning is preserved. A full discussion of Type-less deserves another essay.

Type-less also helps keep APX's performance up: expressions that are inferred as primitive types (tuples, numbers) can be "unboxed" with their operations inlined, meaning simple operations like adding two numbers together or adding two points together is cheap. These sorts of optimizations are important given that live feedback is much less effective if it is not timely.

A Virtual Time Machine

The biggest question about live programming is whether or not it is even technically feasible for programs of non-trivial complexities. Given the immaturity of APX, we cannot answer that question definitively at this time, but our experience in building this prototype are encouraging. APX is implemented on top of what we call a virtual time machine (VTM) that manages time by tracing task state dependencies and incrementally re-executing tasks when their dependencies change. The VTM also maintains different values for the same state at different times according to a sliding window to enable time travel (e.g. via scrubbing on 'tick') and multi-time visualizations (e.g. strobes).

VTM's approach to reactive computation is intrinsically optimistic: rather than try and do everything in the right order, VTM will execute tasks, discover their dependencies dynamically, and re-execute them as needed. Besides freeing programmers from having to reason about correct orders manually, this approach is also more expressive and naturally supports live programming where arbitrary code can change at anytime. There are drawbacks, however:

  • All side effects must be commutative (executable in any order) and must be undoable so they can go away if found to be unneeded. Variable assignment is restricted to priority-based assign-once per time-point semantics to ensure commutativity.
  • The approach must check for and reject dependency cycles because changes can be non-monotonic.
  • The bookkeeping overhead of dependency tracing, state versioning, logging of side effects, and cycle detection can be very high.

Our current prototype reduces bookkeeping overhead through separate slow and fast path task re-executions: a slow path performs bookkeeping work that a fast path does not need to do as long as certain invariants hold; e.g. branching behavior remains the same while the same objects are accessed. This is effective since code execution is relatively "coherent" over time: it executes about the same at t1 as it does at t0, where usually only the primitive values change a bit, which fast paths can deal with. Slow paths also do polymorphic inline caching to eliminate virtual method overhead from fast paths.

To further improve performance, the VTM dynamically compiles APX code using the Dynamic Language Runtime, unboxing and inlining operations on primitive values according to inferred and explicit type information. The result is encouraging even if there is probably still some ways to go. Consider a final stress-test example with collision detection and reaction on multiple bouncing 'Ball' objects:

This example contains many 'task' blocks, where a task is a unit of computation in APX and the VTM that undergoes re-execution separately from its parent task. Adding task blocks to code influences performance by changing what is re-executed together; in the example, the assignment to 'index' is in a separate task from its uses since 'index' is a hash of the ball's position and therefore changes more rarely than a ball's position itself.

This example is computationally more intensive than the previous ones since balls interact through collisions. Live feedback is still useful, but now lag in strobe updates is readily apparent. As the example goes to five balls, the effects of incremental re-execution are readily apparent. When balls are added, the other balls are unaffected by the new ball until a collision occurs; they are not even re-executed before these collision points! When velocity is changed for all balls, before they collide their strobe are updated smoothly. As soon as collisions points are reached, however, update becomes more jerky as changes in the ball's position now influence each other.

There are ways to improve performance here. For one thing, physics could be implemented in C# with APX as a scripting client; collision detection and reaction could then be better optimized. However, this is like saying Python code can go faster by being written in C! Perhaps there are improvements to be had with a better design, e.g. by using swept collision detection rather than position hashing.

A VTM is very analogous to a garbage collector that manages time rather than memory. Even though garbage collection was invented in the 60s, it took until the 90s to become mainstream with various hardware and software improvements. As with those garbage collectors, such live programming technology should improve over time if programmers happen to find live programming experiences to be useful!

Concluding Remarks

This essay has hopefully clarified some of the design aspects of live programming along with some of its technology issues. There is still a lot of work to be done and ideas to be tried, including:

  • Scrubbing is key to a useful live programming experience, and should be expanded beyond constants. The API and custom scrubbing show one way to do this, but more work needs to be done to make these extensible so that they can apply to user-defined APIs (right now, it is all enabled in C# with the help of the type system).
  • Many kinds of visualizations are useful in debugging code while this essay has only explored a few (strobing, probing, textual logging). In particular, programmers would find timelines, connected graphs, tables, and so on, in constructing programmer-oriented interfaces that help them debug code. Future work on APX should explore how different kinds of visualizations can be added to APX in a way that is compatible with live feedback.
  • The ability to link code execution with output leads to a lot of interesting possibilities. For example, one could navigate from paused UI elements to executing code, or even reflect changes in the UI back in the code (we explore UI-code connections in a previous paper; Bret Victor also demonstrates this in his Learnable Programming essay).
  • There are also many great ideas in Bret Victor's Learnable Programming essay that we have yet to explore very well (e.g. creating by abstracting). One of the challenges here is turning these ideas into workable general experience constructs: the devil is always in the design and technology details! Type-less, for example, was needed to make many of the features demonstrated in this essay feasible.
  • Type-less allows object definitions to be inferred based on usage, which is especially useful when driven by code completion. The next step is to use machine learning to make more aggressive inference that can lead to even earlier program executions.
  • Can the concept of a virtual time machine be improved on with a more lower level implementation? For instance, OS-level constructs could be more efficient in managing the time of a program.

Live programming is a very exciting topic rich in opportunity and challenges.

Contact Terms Trademarks Privacy and Cookies Code of Conduct © Microsoft Corporation. All rights reserved.Microsoft