*
Quick Links|Home|Worldwide
Microsoft*
Search for


Research Areas

TrueSkill™ Frequently Asked Questions

Here is a list of questions that gamers have sent us. We have grouped the questions into several categories linked in the right hand column of this page. If you do not find the answer to your question, simply send an Email to trueskil@microsoft.com.


General questions

Q: Why is the ranking system called TrueSkill™ ranking system?

A:We decided to use this name because this is the defining feature of the ranking system: it quickly identifies a gamer's true skill. The primary purpose of the TrueSkill system is to minimise the number of games necessary to find out a gamers' skill in order to optimise matchmaking.

Q: How did you compute the average number of games until convergence for the TrueSkill ranking system?

A:One way to think about the TrueSkill ranking system is that it attempts to identify the correct ordering of n players in terms of 50 skill levels. If each ordering is equally likely, a computer would need log2(50) many bits of information to uniquely encode the skill level of a player. Now, assume that 2 players play a Head-to-Head game. Disregarding draws, the game outcome can provide 1 bit of information (which of the two players was the winner). Since each of these games requires 2 players, the system needs 2*log2(50) many Head-to-Head games per player. Note that the particular Head-to-Head games have to be chosen such that they, in fact, do carry one bit of information. Interestingly, every match-made game where the game outcome is not predictable ahead of time ensures that the game is informative! In general, with k teams of m players in each team, one game outcome provides log2(k!) bits but it needs k*m players per game so in the most general case, the system needs k*m*log2(n)/log2(k!) many games per player. And this is the equation we used in the table!

Of course, this calculation is idealised. There are several factor that increase the number of games necessary:

  • Each game is not providing 1 bit of information because the performance in a particular game varies around the average skill and the bigger this variation, the more likely it is that the less skilled player wins the game. This can eventually lead to the loss of 75% of the information per game!
  • Between games, the TrueSkill ranking system assumes that the skill of the players may have slightly changed. In other words, the rank of each player can have changed and there are extra bits necessary to encode the change in true skill according to learning effects.

But, there are also several factors that decrease the number of games necessary:

  • Each game between two teams has three possible outcomes: win, lose, draw. Knowing which of the three outcomes has been realised after a game thus provides more than 1 bit of information. On the left hand side is a plot of the number of bits provided as a function of the chance of drawing. Obviously, if the chance of drawing is zero we have 1 bit of information. But, if draw is the only possible outcome (chance of drawing = 100%) then no information is provided resulting in 0 bits of information.
  • Although the ranks of each player are unknown, there is usually not an equal chance that a player is of level 50 or level 25. In practice, the distribution of skills usually follows a bell shaped curve (Gaussian). Thus, the number log2(50)=5.64 is smaller; it is actually 5.04.

Overall, we observed in our experiments that the sum of these effects leads to an increase by a factor of 2 - 3 in the numbers of games necessary per gamer.

Q: What is the difference between skill and performance?

A:The TrueSkill ranking system implicitly uses a performance model that represents your (hypothetical) score in a particular game. Skill is the average performance. The TrueSkill  ranking system maintains a belief in your skill and assumes that your performance in a particular game varies around your skill.

Q: How many games do I have to win before I go up one level?

A: This depends a lot on how many games you have already played, how many games your opposition have already played and what type of games you play. It is a strength of the TrueSkill ranking system to move you up very quickly early on but to reduce the step-size in the updates after a series of consistent games. In general, the more people per team, the longer it takes to go up or down one level. But the more teams per game, the faster you can go up or down. Here is a list of game modes and number of wins necessary before you go up a level (assuming you have already played a fair number of games; otherwise you usually go up one level in one game).

Game Mode

Number of Games per Gamer

8 Players Free-For-All

3

4 Players Free-For-All

4

2 Players Free-For-All

7

4 Teams/2 Players per Team

5

2 Teams/4 Players per Team

10

Q: How many games do I have to lose before I go down one level?

A:These numbers exactly equal the numbers given in the last question. The TrueSkill ranking system has no preferred direction of changing the skill belief.

Q: I have been playing a lot of unranked training games and I think I am now a much more skilled player. Will the TrueSkill ranking system be able to identify my new, higher skills? If so, how many games do I have to play before the TrueSkill ranking system knows my new skill?

A:The TrueSkill ranking is assuming a small skill change between any two consecutive games in a game mode so it is able to identify your new, higher skill. But, if your skill has completely changed (you became the best player in the world from previously being the worst player in the world), then you would need to play a large number of games. We designed the system such that it would need between 50 - 100 games before the system would be able to track a substantial skill increase/decrease.

Q: If I understand the TrueSkill update formula correctly then the change in μ is largest for the first few games and decreases over time. Thus, my first few games are most important; if I lose these games, it will take the TrueSkill much longer to converge to my skill. Right?

A: Not exactly right. It is correct, that the change in μ is getting smaller and smaller with every game played, but regardless if you win or lose them. However, TrueSkill always takes more recent game outcomes more into account than older game outcomes. Hence, when playing against a set of players of same skill multiple times, a late win counts more than an early win. As an example, try the following in the interactive rank calculator (we will choose Alice for the analysis and assume a draw probability of 10%)

Scenario 1: One win followed by one loss: Final TrueSkill rank = 13

Scenario 2: One loss followed by one win: Final TrueSkill rank = 16

As you can see, winning the second game rather than the first actually resulted in a skill estimate ~2.5 levels higher than winning the first game and losing the second (to be precise, it is 2.586 = 26.293 - 23.707)! Note, however, that in this example the second game is not very well match-made. If all games are perfectly match-made, then the situation reverses. The reason is that the second game is lost against a stronger opposition or won against a weaker opposition. Try it out yourself in the interactive rank calculator.

Q: What other ranking systems are there?

A: It is impossible to enumerate all available ranking systems here. But, in order to illustrate the wide range of systems out there, let us give a few examples:

There is an interesting article Collective Choice: Competitive Rating Systems by Christopher Allen covering some of the above ranking systems.

Q: I am a chess player and I have played online chess at the Free Internet Chess Server. They use a system called Glicko which uses rating deviations. What is the relation between the TrueSkill ranking system and the Glicko ranking system?

A: The Glicko system was developed by Mark E. Glickman, chairman of the US Chess Federation (USCF) ratings committee. To the best of our knowledge, Glicko was the first Bayesian ranking system. Similarly to the TrueSkill ranking system, the Glicko system uses a Gaussian belief over a player's skill which can be represented by two numbers: The mean skill and the variation of the skill (called rating deviation in the context of Glicko). There are a few differences between the TrueSkill ranking system and Glicko:

  • The Glicko system (deliberately) does not model draws but it makes an update as the average of a win and a loss (per player). In the TrueSkill ranking system, draws are modelled by assuming that the performance difference in a particular game is small. Hence, the chance of drawing only depends on the difference of the two player's playing strength. However, empirical findings in the game of chess show that draws are more likely between professional players than beginners. Hence, chance of drawing also seems to depend on the skill level.
  • In the Glicko system, the uncertainty in a player's skill grows linearly with time not played. In the TrueSkill ranking system, it grows by a constant amount between any two consecutive games. However, this could be changed in the TrueSkill ranking system.
  • The Glicko system uses a different performance distribution known as the logistic distribution; the TrueSkill ranking system uses a Gaussian distribution (see picture on the right). This results in two different update algorithms for two player matches which make the actual update equations look different. However, conceptually both update algorithms perform very similarly. The Glicko system uses a different performance distribution known as the logistic distribution; the TrueSkill ranking system uses a Gaussian distribution (see picture on the right). This results in two different update algorithms for two player matches which make the actual update equations look different. However, conceptually both update algorithms perform very similarly.

So, what is the difference to the Glicko system? Glicko was developed as an extension of ELO and was thus naturally limited to two player matches which end in either win or loss. Glicko cannot update skill levels of players if they compete in multi-player events or even in teams. The logistic model would make it computationally expensive to deal with team and multi-player games. Moreover, chess is usually played in pre-set tournaments and thus matching the right opponents was not considered a relevant problem in Glicko. In contrast, the TrueSkill ranking system offers a way to measure the quality of a match between any set of players.

^ back to top

Frequently Asked Questions
 
TrueSkill™ Ranking System
 
Related Links

©2008 Microsoft Corporation. All rights reserved. Terms of Use |Trademarks |Privacy Statement