By Rob Knies
September 14, 2009 3:00 PM PT
The Internet keeps getting better and better.
Ten years ago, people newly introduced to the wonders of the World Wide Web were accomplishing tasks they’d never even contemplated. A decade on, more of us than ever are enthralled by this technological revolution; witness the mushrooming popularity of interactive, Web 2.0 marvels such as Facebook and Twitter, which were barely imaginable in the waning days of the 20th century.
But the advent of these sites comes at a cost. As features are added and enhancements phased in, the code for the Internet’s favorite pastimes keeps getting bigger and bigger. Users eager to join the Web 2.0 bandwagon often must wait and wait as unwieldy code as large as a megabyte is downloaded and installed on their computers. In an age of virtually instantaneous gratification, the route to the Internet’s newest sensations seems frustratingly old-fashioned.
What’s to be done? Ben Livshits knows.
Together with colleague Emre Kıcıman, Livshits, a researcher in the Runtime Analysis and Design group at Microsoft Research Redmond, has developed technology called Doloto, a shorthand version of Download Time Optimizer. The project, detailed in Doloto: Code Splitting for Network-Bound Web 2.0 Applications and now available for download, analyzes application workloads and automatically slices large codebases into a collection of clusters. The clusters needed to make the application work and representing top-level functionality are transferred to the user’s browser immediately, while those of a secondary nature are delivered on demand.
“The goal of the project is to make Web 2.0 applications run faster,” Livshits says. “This is a whole new area, as far as software development and how to optimize these applications.
“The reason why people move to the Web 2.0 model of development is because code is running on the user’s machine, which means responsiveness is increased. You don’t have to go to the server every time you need something done. But as you move this code to the client to make the application more responsive, the application cannot run before all the code arrives at the client. To make the current generation of applications run faster, or to enable the next generation of these applications, you need to deal with this problem of code delivery. How do you do that? That’s the question that Doloto tries to address.”
And, explains Kıcıman, a researcher for the Internet Services Research Center, there are ancillary benefits, as well.
“A consequence of Doloto,” he says, “is that developers can now add a new feature into their Web applications without worrying about its effect on the application’s download time.
“Before, the usefulness of a new feature would be weighed against its performance impact. Very useful features would be cut because the added code made the application load too slowly. Now, with Doloto, developers can continue to build richer and richer Web-application experiences and be assured that the extra code will not be a burden for users starting the application.”
The timing of the project couldn’t be more fortuitous. Web-based applications continue to grow in size, and in mobile scenarios, download times are critical, so the less code needed to enable an application to begin running, the better.
It’s not easy, though. Code splitting is notoriously difficult to achieve manually. An application needs to be built for splitting, with portions grouped according to the stage in the download process when they are expected to be executed. It’s a challenge to maintain such manual code splits when an application or its workload changes.
Doloto, therefore, aims to deliver automatic code splitting, which makes feasible the delivery of additional code clusters as they are needed.
“We interweave the code download in the application execution,” Livshits says, “so you don’t really have to wait, which means the application starts faster and runs faster.”
“Another consequence is that rarely executed code is rarely downloaded. Because we do things on demand, if you have some functionality that is only used by 1 percent of your users, then, generally speaking, it’s only going to be downloaded on demand.”
Examples abound, as Livshits notes.
So how is it possible to remove half an application’s code at initialization and still get it to work?
“The idea is a relatively well-known approach in compiler literature,” Livshits says, “called profile-based optimization. We essentially create a usage profile. Doloto has a proxy-based instrumentation approach that doesn’t require the user to change anything. You don’t have to change the browser; you don’t have to change the application. Doloto intercepts all the code being sent to the browser, rewrites it before it’s shipped to the browser, inserts the instrumentation, and, as the user is using the application, Doloto observes the functionality of the application being used.
The remaining code, representing features not commonly used when a program is first downloaded, is clustered separately and downloaded as needed. Doloto examines a script, develops a profile of it, and groups the code into clusters.
“Ideally, initially downloaded clusters would correspond to high-level application functionality,” Livshits says. “Independently of how the developers write their code, they structure it into files for reasons of maintainability, good programming practices, and encapsulation. Doloto is oblivious to all that. It watches what happens at runtime, and then it proceeds to rewrite an application based on what it has observed.”
Doloto also delivers flexibility.
“To write a high-performance Web application today,” Kıcıman notes, “developers have to optimize their code structure based on performance considerations. This is hard and hurts code readability and maintainability, because performance considerations cut across libraries, class hierarchies, and other traditional code boundaries.
“But with Doloto, developers can focus on their application’s functionality and on making sure that their code is structured to be easy-to-read and maintainable. All the performance-related issues are now handled by Doloto’s rewriting during the deployment phase of the application.”
Doloto also can take the same codebase and produce different variations of it for a desktop or a mobile phone, using automatic rewriting.
“In the case of a mobile phone,” Livshits adds, “you might opt for smaller clusters, just because moving code or data comes at such a premium. You end up with smaller clusters of a few functions each.”
With the code having been profiled and clustered, the final step in the Doloto process is rewriting the code.
This process of profiling, training, and execution enables Doloto to load code dynamically, even if an application was not originally developed to support such a scenario.
“If you were to write an application tomorrow,” Livshits says, “you would train with respect to a particular set of workloads you anticipate your users to have. It’s a relatively straightforward step. We do it all the time; it takes two or three minutes. But the idea is that you wouldn’t have to do it again unless you have a major rewrite of your code. We don’t anticipate that this is something you would do every day.”
For Livshits, whose research interests encompass compilers, static-analysis tools, runtime-analysis tools, and languages, the motivation for the work that led to Doloto was simple.
“I wanted to analyze and improve applications that I use on a daily basis,” he explains. “I use things like Facebook and Hotmail every day. I find improving applications that I use myself especially compelling, even if we don’t see this kind of research in systems and language very much.”
The seeds of the Doloto project were planted during an earlier effort called AjaxScope, which addressed application monitoring.
“It’s a natural transition,” Livshits notes, “to start with something like monitoring and then say: ‘Now, we know what’s not working that well. How do we make it better?’
“In many ways, these two projects complement each other quite nicely. Our observation was that the sheer amount of code is what’s making these applications slow oftentimes. You start with figuring out how things are in the wild and measuring things and developing machinery for measuring things and analyzing things, then you move on to engineering, which is how to make things better.”
That, it appears, is exactly what Doloto promises to do.
“For a while,” Livshits says, “people have been talking about the CPU memory gap. Memory is much slower than the CPU, and there is a lot of architecture research, compiler, and programming-language research, that is geared toward resolving that. I think what we are starting to see is there is also a new breed of application that’s network-bound, and these new AJAX applications we are seeing are a prime example. Going forward, looking at the cloud, we’ll see more and more applications that fall into this space. We will have applications that rely on there being a network and whose performance characteristics rely on the performance characteristics of such a network.
“The attraction of this project was that we were breaking new ground in performance-optimization work that focuses on network-bound applications.”
But the effort was not without its difficulties.
“Automating the task is quite challenging,” Livshits stipulates. “In many ways, it’s an engineering challenge. You look at more and more use scenarios and make sure you cover them all as much as you can. It’s not an uncommon challenge. Growing pains are quite significant in this space.
“It’s easy to build something that works on your personal site or a few other sites that you like or applications that you like. But it’s a very difficult task to build something that runs on dozens or hundreds of applications on the Web that you haven’t seen before. Getting to that point is a humongous challenge. It’s easy to underestimate the amount of effort involved in making something that was formulated as a prototype into a tool that’s usable by people who are not experts. Luckily, we’ve had some help, most recently from João Paulo Porto, our intern this summer.”
With Doloto having been released to the public, Livshits and Kıcıman are eager to take the project to the next stage.
“The next step is to see how people use it in the field,” the former says. “That’s part of the reason to make this release available, which is to see what challenges people run into, what scenarios they want to explore.
“Another thing is that it would be important that Doloto becomes an integral part of a distributed system. It’s interesting to generalize for the experience we have gained with AJAX applications to see how that can inform the design of future generations of languages and tool kits with the same goal, enabling bigger and more powerful applications that run across multiple tiers to run as a distributed system effectively.”
For the moment, however, having shepherded the Doloto project to the point where it can be released into the world is satisfaction in itself.
“Analyzing applications that I use—I think that’s exciting,” Livshits says, “that and creating innovation in an area that’s up for grabs, where we see new languages and projects almost every month.
“This is such an exciting area that creating innovation is something that’s likely to inform the next generation of designs we see. These two things in combination are what drive me to work on this.”