TrueSkill: Matchmaking Made Easy for Xbox Live
By Rob Knies
November 21, 2005 6:00 AM PT

Combined, Ralf Herbrich and Thore Graepel spend about 45 hours per week playing Xbox games. They’re both inordinately fond of Halo and Halo 2; Herbrich also likes Forza Motorsport and Tom Clancy’s Splinter Cell, while Graepel enjoys a good match of Top Spin. Both men openly describe themselves as Xbox “addicts.”

Avid gamers are highly competitive and demanding, and Herbrich and Graepel, Microsoft researchers who are part of the Machine Learning and Perception group at Microsoft Research’s Cambridge lab, are no exceptions. Their passion has led them to create a new ranking and matchmaking system for online gaming that promises to offer exciting new dimensions for fellow enthusiasts.

That new system, called TrueSkill™, will be implemented in the Xbox Live online service that will accompany the Xbox 360 gaming and entertainment console, which debuts Nov. 22 in the United States, Dec. 2 in Europe, and Dec. 10 in Japan.

Herbrich and Graepel saw an opportunity to improve current ranking and matchmaking systems by applying their computing expertise to a real-world problem.

“While we were participating in the Halo 2 beta test in July 2004,” recalls Graepel, “we became aware of the problem of ranking players correctly. At the end of the day, we wanted to be ranked correctly in Halo 2. We started to work on the problem from a Bayesian learning perspective—in which the data is the game outcomes between all teams and players in single games—and developed TrueSkill.”

The name derives from the fact that TrueSkill is a skill-based ranking system. Its antecedents are many; among the most notable is ELO, another skill-based system developed by Hungarian-born American physics professor Arpad Elo that is commonly used to rank chess players. But there are significant differences between ELO and TrueSkill that give the latter certain advantages, particularly in the greatly expanded functionality of the online-gaming experience:

  • The TrueSkill system—as illustrated by the TrueSkill Ranking System Calculator—uses two numbers to represent the skill of each gamer: his or her mean skill, and an estimate of the uncertainty that the ranking system has in the skill estimate. The mean skill is the average skill level demonstrated by a gamer over time. The uncertainty estimate starts large for an individual and steadily decreases as more game results are logged. In contrast, ELO uses only one number to develop its rankings.
  • TrueSkill is designed for multi-player, multi-team games that are popular in online gaming. ELO can’t make rankings from team games or for games involving more than two parties.
  • TrueSkill’s probabilistic ranking system enables effective matchmaking between opponents, leading to well-balanced games in which everyone has a near-equal chance to win.

The result is a ranking system that can be applied to all Xbox Live games, with no limits on the number of teams or the number of players. In fewer than 20 games, a new player joining a million-player league can be ranked accurately using TrueSkill—in near-optimal speed.

“We hope to improve the online experience for players on Xbox Live across all games,” Herbrich said. “TrueSkill needs a very small number of games per player to converge to their ‘true’ skills, and thus minimizes the time players need to spend in matches with opponents or partners who have skills that do not match their own skill level well.

“We have done a significant amount of testing with Halo 2 data, the results of which give us a great deal of confidence about the performance of the TrueSkill system. It is difficult to quantify the improvement over previous systems, but for those people who play Halo 2, we can say that TrueSkill will need no more than 10 games, depending on matchmaking, to estimate your level exactly right.”

It’s all about making Xbox Live fun for online gamers.

“What makes playing online games fun?” Herbrich asks. The answers are threefold:

  • A good, broadband network connection.
  • Seamless setup.
  • Competitive matches.

The latter is where TrueSkill comes in.

“A competitive game is a fun game,” Graepel says. “The primary use of the ‘true skill’ estimate is for improved, correct matchmaking.”

TrueSkill also enables the creation of skill-balanced teams without requiring similar skill levels or a similar number of players per team. This ability is based on the assumption that a team’s aggregate skill is the sum of the skills of all the players on the team.

There are a few other assumptions that come into play:

  • TrueSkill generates a “conservative skill estimate,” an approximation likely lower than a player’s actual skill level.
  • Only the final standings of a particular game are used to derive the rankings. The amount of differentiation between the performances of players in a game is not taken into account.
  • TrueSkill determines the quality of a potential match by estimating the potential for a draw.

“We call a match ‘uninteresting’ if the chances of winning for the participating players are very unbalanced,” Herbrich says. “Very few people enjoy playing a match they cannot win or cannot lose. The trick is to use the hypothetical chance of drawing with someone else. If you are likely to draw with another player, then that player is a good match for you.”

In addition to skill ranking and matchmaking, TrueSkill can recognize the skills of players for potential publication.

“Some games,” Graepel says, “also will display skill leader boards that give you a good idea of how skilled you are in comparison to your friends.”

And there is no real way to “game” the system to obtain rankings higher than warranted.

“The only way that players can inflate their rankings,” Herbrich notes, “is to manipulate the game outcomes reported to Xbox Live. We cannot guarantee that people will not find ways to cheat, but exploitations of the ranking system itself should be a thing of the past.”

A project like TrueSkill is rarely the result of a couple of researchers working by themselves. Among those who provided significant help were Patrick O’Kelley and David Shaw from Xbox Live, as well as Chris Butcher and Roger Wolfson of Bungie Studios, who provided Halo 2 data for evaluation. The collaboration has resulted in a ranking system that figures to enhance even further the Xbox Live environment.

“From a gamer’s perspective,” Graepel says, “it feels almost magical to see how the system ‘knows’ about your skill very, very soon after you start playing.

“From a researcher’s perspective, it is very cool to see technologies that were only developed a few years ago work so perfectly. We did not have to compromise in terms of mathematical rigor when developing TrueSkill, yet the system is utterly practical and useful.”

In the end, that practicality, that transfer of research from the theoretical to a product, is the real value of the TrueSkill project.

“It is one thing to theoretically develop an algorithm like TrueSkill and to publish papers about it in the scientific community,” Herbrich concludes. “But what really makes us proud is the fact that we were able to take TrueSkill to the gaming community, to the people we care about, and to make it work in practice. We really hope that TrueSkill will contribute to making Xbox 360 and Xbox Live even more exciting and fun.”

© 2005 Microsoft Corporation. All rights reserved. Microsoft, Forza Motorsport, Halo, TrueSkill, Xbox, Xbox Live and Xbox 360 are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. All other trademarks are property of their respective owners.