By Janie Chang
November 22, 2010 9:00 AM PT
Microsoft’s annual Professional Developers Conference (PDC) gathers software developers for two days of technical sessions and a chance to meet some of the people behind Microsoft’s technology innovations. The 2010 PDC, held Oct. 28-29 in Redmond, attracted a sellout crowd. Those who could not attend were not left out of the excitement, however: They had the option to participate at various Microsoft offices, academic institutions, and other locations worldwide—or to view high-definition, live, streaming broadcasts of keynotes and sessions from their own computers.
Microsoft Research contributed to the excitement at PDC10 with 20 demos that echoed the conference’s computing themes of client and devices, cloud services, and framework and tools. Developers appreciated the deep technical content and a hands-on approach, while researchers enjoyed an opportunity to mingle and chat with potential users.
Peli de Halleux, senior research software-development engineer, and colleague Nikolai Tillmann, principal research software-development engineer, were on duty for a lighthearted but educational demo. As part of the Research in Software Engineering group (RiSE), the two are active in development work for Pex, a Visual Studio add-in for testing .NET Framework applications. Pex, which has been available since April 2008, automatically generates test suites with high code coverage, right from the Visual Studio code editor. During PDC10, de Halleux and Tillmann demonstrated a website that offers a simplified version of the fully featured Pex power tool, called Pex for fun.
“Part of our work,” Tillmann explains, “is to acquaint students with our research results and tools—in particular, Pex. But it gets complex if students first have to install the right version of Windows and Visual Studio just to try it out. So we created the website as an educational tool that makes Pex easy to try out.”
The site enables users to write small C#, Visual Basic, or F# programs and analyze their code without needing to install any software—all the work happens in the cloud. The Pex team then decided to load the website with code puzzles for users to solve. But they didn’t stop there. Pex for fun also hosts coding duels in which users compete to write code that replicates the implementation of a secret code puzzle.
What kind of reaction did Pex for fun get from attendees?
“We got a lot of questions around Visual C# Intellisense,” de Halleux replies, “because having that functionality in the browser took many people by surprise. Everyone wanted to know how we did it. The Visual Studio-like editing functionality makes it fun to write code in the browser.”
The developer community took notice.
“They all had a good time,” Tillmann grins. “The great thing about using coding duels was that it made the presentation interactive: Instead of us showing everything, we let our visitors take over the keyboard and win the duels. People were really getting into it, suggesting fixes and enjoying the experience.
“Since Pex for fun launched in June 2010, the site has logged more than 200,000 attempts to win coding duels. It often takes several attempts to win, and some people give up. But more than 12,000 coding-duel sessions have been won! There is a live feed of what is happening at Pex for fun, if people are interested in knowing the stats.”
On a more serious note, the success of Pex for fun has prompted the RiSE team to take other research tools into the cloud for easy access by developers.
Client-plus-cloud computing is a relatively new computing platform that harnesses the unlimited, on-demand computation and data storage offered by web services to deliver applications and data to client devices such as PCs, mobile phones, sensors, and other hardware. The main reason why it is difficult to build scalable, reliable, and efficient applications for the cloud—an effort still in its early stages—is because existing programming models and tools don’t provide the architectural support needed for building scalable, distributed applications.
Sergey Bykov, lead software development engineer with Microsoft Research’s eXtreme Computing Group, spent his time during PDC10 explaining how the Orleans project developed a new software platform, which runs on Windows Azure, that makes it easier to build cloud services.
“People build such systems today,” Bykov says, “but the effort is very expensive and error-prone. One of the main goals we have for Orleans is to democratize programming for the cloud by providing the right set of abstractions and by encapsulating a set of best-practice techniques that should guide the programmer to success.”
Orleans offers a simple programming model based on .NET and built around the concept of “grain,” a computation unit with private and shared states that communicates by sending messages to other grains and receiving and replying to requests from clients. Combined with the Orleans distributed runtime and tools, the platform raises the level of abstraction and helps developers build scalable, correct cloud applications.
The other challenge of cloud application development is elasticity: System resources should be allocated to an application on an as-needed basis, just enough to handle the current rate of requests.
“The system infrastructure in modern data centers,” Bykov explains, “supports such dynamic allocation of resources, usually at the virtual-machine level. However, except for some fairly trivial cases, it is difficult to program an elastic cloud application that can take advantage of such capabilities. Orleans provides a comprehensive solution that helps build and manage elastic applications with significantly less effort.”
As for the developers attending his demos, Bykov found them knowledgeable about the challenges of building high-scale cloud applications and appreciative of his team’s efforts to address the issues. His audience quickly understood basic concepts and the notion of grains.
“And then,” Bykov smiles, “the conversations quickly got to practical matters, such as when and how they could have Orleans to try out.”
Every first-year computer-science student is familiar with the phrase “garbage in, garbage out” (GIGO), a reminder that poor-quality data always manages to defeat brilliant coding. For data warehouses, the problem of GIGO is more acute than just data-entry errors. Because these repositories collect data from numerous independent sources, the data might exhibit different standards and naming conventions. To maintain the high-quality data their customers require for customer-relationship-management and other decision-support systems, data warehouses spend significant amounts of time and money to detect and correct “dirty” data before it is loaded into the warehouse.
The Data Management, Exploration, and Mining group at Microsoft Research Redmond is fascinated by the challenges of transforming and cleaning data. As part of the Data Cleaning project, team members Arvind Arasu, Raghav Kaushik, Surajit Chaudhuri, Kris Ganjam, and Vivek Narasayya have been working on “fuzzy joins” for SQL Server, looking at ways to discern when a given entity is inconsistently represented in multiple ways throughout an enterprise.
The goal is to find similar matches quickly, and defining similarity is a key challenge. The team has developed Microsoft Fuzzy Lookup technology, a .NET library that takes a text-based record and returns the top matches from a reference table of records that have similarity above a specified threshold. Each match record is returned with a score between 0.0 and 1.0 that indicates how similar it is to the query record.
The key to the usefulness of any such technology is its flexibility. For example, Fuzzy Lookup can handle matching across all types of errors, such as those shown in the following records:
The result is a domain-independent, yet customizable way of handling transformations.
The result is a domain-independent, yet customizable way of handling transformations.
“Fuzzy Lookup can customize data using domain-specific knowledge in the form of transformation rules,” Ganjam says. “For example, users can specify that ‘Robert’ and ‘Bob’ or the ‘United States of America’ and ‘USA’ are synonymous.”
Speed is just as critical to the lookup process. The latest implementation of Fuzzy Lookup was tuned to meet stringent performance requirements of two to four milliseconds per lookup, and it is being used in production by Bing Maps to match user-location queries in real time to a reference data set of more than 20 million place entities, such as “Space Needle” or “Eiffel Tower.”
What has been the most common feedback from PDC10 attendees?
“‘I need it now!’” Ganjam says. “‘When can I get this?’”
Fuzzy Lookup is an internal project, but such enthusiastic comments from developers during PDC10 have provided the team with useful validation of their work.