The TrueSkill™ ranking system is a skill based ranking system for Xbox Live developed at Microsoft Research.
Here is a list of questions that gamers have sent us. We have grouped the questions into several categories linked in the right hand column of this page. If you do not find the answer to your question, simply send an Email to email@example.com.
Q: Why is the ranking system called TrueSkill™ ranking system?
A: We decided to use this name because this is the defining feature of the ranking system: it quickly identifies a gamer's true skill. The primary purpose of the TrueSkill system is to minimise the number of games necessary to find out a gamers' skill in order to optimise matchmaking.
Q: How did you compute the average number of games until convergence for the TrueSkill ranking system?
A: One way to think about the TrueSkill ranking system is that it attempts to identify the correct ordering of n players in terms of 50 skill levels. If each ordering is equally likely, a computer would need log2(50) many bits of information to uniquely encode the skill level of a player. Now, assume that 2 players play a Head-to-Head game. Disregarding draws, the game outcome can provide 1 bit of information (which of the two players was the winner). Since each of these games requires 2 players, the system needs 2*log2(50) many Head-to-Head games per player. Note that the particular Head-to-Head games have to be chosen such that they, in fact, do carry one bit of information. Interestingly, every match-made game where the game outcome is not predictable ahead of time ensures that the game is informative! In general, with k teams of m players in each team, one game outcome provides log2(k!) bits but it needs k*m players per game so in the most general case, the system needs k*m*log2(n)/log2(k!) many games per player. And this is the equation we used in the table!
Of course, this calculation is idealised. There are several factors that increase the number of games necessary:
- Each game is not providing 1 bit of information because the performance in a particular game varies around the average skill and the bigger this variation, the more likely it is that the less skilled player wins the game. This can eventually lead to the loss of 75% of the information per game!
- Between games, the TrueSkill ranking system assumes that the skill of the players may have slightly changed. In other words, the rank of each player can have changed and there are extra bits necessary to encode the change in true skill according to learning effects.
But, there are also several factors that decrease the number of games necessary:
- Each game between two teams has three possible outcomes: win, lose, draw. Knowing which of the three outcomes has been realised after a game thus provides more than 1 bit of information. On the left hand side is a plot of the number of bits provided as a function of the chance of drawing. Obviously, if the chance of drawing is zero we have 1 bit of information. But, if draw is the only possible outcome (chance of drawing = 100%) then no information is provided resulting in 0 bits of information.
- Although the ranks of each player are unknown, there is usually not an equal chance that a player is of level 50 or level 25. In practice, the distribution of skills usually follows a bell shaped curve (Gaussian). Thus, the number log2(50)=5.64 is smaller; it is actually 5.04.
Overall, we observed in our experiments that the sum of these effects leads to an increase by a factor of 2 - 3 in the numbers of games necessary per gamer.
Q: What is the difference between skill and performance?
A:The TrueSkill ranking system implicitly uses a performance model that represents your (hypothetical) score in a particular game. Skill is the average performance. The TrueSkill ranking system maintains a belief in your skill and assumes that your performance in a particular game varies around your skill.
Q: The default TrueSkill of a new player is 25, right?
A: That’s not fully correct. The TrueSkill value that is displayed in the leaderboard is the conservative estimate of a player's skill, computed from two hidden parameters that are used to track a player’s skill: the mean skill μ and the skill uncertainty σ. The TrueSkill value is then μ-3*σ. What is correct is that a new player is assigned a mean skill of μ=25 and a skill uncertainty of σ=8.333. Thus, the TrueSkill of a new player is 25-3*8.333 = 0. Note that these two choices for μ and σ effectively mean that a new player's skill can be anywhere from 0 to 50, representing a state of complete uncertainty about their skill.
Q: How many games do I have to win before I go up one level?
A: This depends a lot on how many games you have already played, how many games your opposition have already played and what type of games you play. It is a strength of the TrueSkill ranking system to move you up very quickly early on but to reduce the step-size in the updates after a series of consistent games. In general, the more people per team, the longer it takes to go up or down one level. But the more teams per game, the faster you can go up or down. Here is a list of game modes and number of wins necessary before you go up a level (assuming you have already played a fair number of games; otherwise you usually go up one level in one game).
|Game Mode||Number of Games per Gamer|
8 Players Free-For-All
|4 Players Free-For-All||4|
|2 Players Free-For-All||7|
|4 Teams/2 Players per Team||5|
|2 Teams/4 Players per Team||10|
Q: How many games do I have to lose before I go down one level?
A: These numbers exactly equal the numbers given in the last question. The TrueSkill ranking system has no preferred direction of changing the skill belief.
Q: I have been playing a lot of unranked training games and I think I am now a much more skilled player. Will the TrueSkill ranking system be able to identify my new, higher skills? If so, how many games do I have to play before the TrueSkill ranking system knows my new skill?
A: The TrueSkill ranking is assuming a small skill change between any two consecutive games in a game mode so it is able to identify your new, higher skill. But, if your skill has completely changed (you became the best player in the world from previously being the worst player in the world), then you would need to play a large number of games. We designed the system such that it would need between 50 - 100 games before the system would be able to track a substantial skill increase/decrease.
Q: If I understand the TrueSkill update formula correctly then the change in μ is largest for the first few games and decreases over time. Thus, my first few games are most important; if I lose these games, it will take the TrueSkill much longer to converge to my skill. Right?
A: Not exactly right. It is correct, that the change in μ is getting smaller and smaller with every game played, but regardless if you win or lose them. However, TrueSkill always takes more recent game outcomes more into account than older game outcomes. Hence, when playing against a set of players of same skill multiple times, a late win counts more than an early win. As an example, try the following in the interactive rank calculator (we will choose Alice for the analysis and assume a draw probability of 10%)
As you can see, winning the second game rather than the first actually resulted in a skill estimate ~2.5 levels higher than winning the first game and losing the second (to be precise, it is 2.586 = 26.293 - 23.707)! Note, however, that in this example the second game is not very well match-made. If all games are perfectly match-made, then the situation reverses. The reason is that the second game is lost against a stronger opposition or won against a weaker opposition. Try it out yourself in the interactive rank calculator.
Q: What other ranking systems are there?
A: It is impossible to enumerate all available ranking systems here. But, in order to illustrate the wide range of systems out there, let us give a few examples:
- ELO (used by the US Chess Federation and the World Chess Federation). This is a very interesting survey of more ranking systems used in chess.
- Glicko (used by the Free Internet Chess Server).
- Halo 2 Ranking System.
- Go Ranking
- Tennis rankings (used by the ATP).
- Kudos Ranking System (used in Project Gotham Racing).
There is an interesting article Collective Choice: Competitive Rating Systems by Christopher Allen covering some of the above ranking systems.
Q: I am a chess player and I have played online chess at the Free Internet Chess Server. They use a system called Glicko which uses rating deviations. What is the relation between the TrueSkill ranking system and the Glicko ranking system?
A: The Glicko system was developed by Mark E. Glickman, chairman of the US Chess Federation (USCF) ratings committee. To the best of our knowledge, Glicko was the first Bayesian ranking system. Similarly to the TrueSkill ranking system, the Glicko system uses a Gaussian belief over a player's skill which can be represented by two numbers: The mean skill and the variation of the skill (called rating deviation in the context of Glicko). There are a few differences between the TrueSkill ranking system and Glicko:
- The Glicko system (deliberately) does not model draws but it makes an update as the average of a win and a loss (per player). In the TrueSkill ranking system, draws are modelled by assuming that the performance difference in a particular game is small. Hence, the chance of drawing only depends on the difference of the two player's playing strength. However, empirical findings in the game of chess show that draws are more likely between professional players than beginners. Hence, chance of drawing also seems to depend on the skill level.
- In the Glicko system, the uncertainty in a player's skill grows linearly with time not played. In the TrueSkill ranking system, it grows by a constant amount between any two consecutive games. However, this could be changed in the TrueSkill ranking system.
- The Glicko system uses a different performance distribution known as the logistic distribution; the TrueSkill ranking system uses a Gaussian distribution (see picture on the right). This results in two different update algorithms for two player matches which make the actual update equations look different. However, conceptually both update algorithms perform very similarly. The Glicko system uses a different performance distribution known as the logistic distribution; the TrueSkill ranking system uses a Gaussian distribution (see picture on the right). This results in two different update algorithms for two player matches which make the actual update equations look different. However, conceptually both update algorithms perform very similarly.
So, what is the difference to the Glicko system? Glicko was developed as an extension of ELO and was thus naturally limited to two player matches which end in either win or loss. Glicko cannot update skill levels of players if they compete in multi-player events or even in teams. The logistic model would make it computationally expensive to deal with team and multi-player games. Moreover, chess is usually played in pre-set tournaments and thus matching the right opponents was not considered a relevant problem in Glicko. In contrast, the TrueSkill ranking system offers a way to measure the quality of a match between any set of players.
Q: I am always playing together in the same team with my friend JoeDoe. Will the TrueSkill ranking system be able to differentiate between us two in terms of skills? In other words, is the TrueSkill ranking system capable of finding that I am the more skilled player of us two?
A: If both you and your friend only play ranked team games together then the TrueSkill ranking system cannot distinguish between you two; it always compares the team's skills (sums of the player's skills in the teams) and 'distributes' the gain/loss proportional to the individual player's uncertainties (see detailed description). But note: if your friend also plays team games with anyone other than you then the TrueSkill ranking system will be able to identify the more skilled player of your two. Also, if both of you always only play together, you might consider forming a clan.
Q: Why does it take so many more games until convergence if I play a team game as opposed to a Free-for-All game?
A: The problem is that very little information about the individual player's skill is contained when only exploiting which of two teams wins or if the two teams draw. This is effectively only up to 1.6 bit of 'information' that needs to be 'shared' between all players participating in the game. More specifically, consider these two scenarios:
- Alice, Bob, Chris and Darren play a 4-player-Free-for-All game and Alice wins against Bob wins against Chris wins against Darren. This game outcome provides a lot of information: it's fair to say that probably Alice is better than Bob, Alice is better than Chris, Alice is better than Darren, Bob is better than Chris, etc.
- Alice and Bob play against Chris and Darren in a 2-Teams-2-Player-per-Team game and Alice and Bob win against Chris and Darren. Can we still say that this mean that Alice is better than Chris and Alice is better than Darren? No! All we can confidently assert is that Alice and Bob are better than Chris and Darren. So, the team game outcome provides only knowledge about an individual's skill in conjunction with all the other team members.
Q: How will a team killer be ranked in the TrueSkill ranking system?
A: In the TrueSkill ranking system, the team skill is the sum of the skills of all players in the team. The TrueSkill ranking system has the potential to assign a negative skill to a player; if such players are added to a team, then the skill of the team goes down (because a team killer both reduces the chance to score against the other team or might even inflict negative points by suicide). Fortunately, the TrueSkill ranking system's matchmaking procedure will eventually ensure that team killer will only play each other. And this can only be a good thing.
Q: I am playing a team game and all the players in my team drop out of the game. Of course, I lose the game. Will I lose as many skill points as all the people who left me standing in the rain?
A: Unfortunately, yes. All alternative options are possible exploits for cheating:
- If the TrueSkill ranking system does not count the game at all then the losing team can always ensure not to lose points by dropping out early (entirely).
- If the TrueSkill ranking system only uses the team configurations at the end of the game then both the players that dropped would not be penalised and the remaining player can be arbitrarily boosted (that is, shortly before the end of the game all but one player drop from a team; for the update equation it would now seem that a single player has won against a team of, say, 4 players and would apply a massive positive update).
- If the TrueSkill ranking system would introduce an arbitrary lowest rank in which every player falls that drops before the end of the game, then, again, the remaining player(s) in a team can be arbitrarily boosted (he won against the losing team and all the players that dropped. This approach would penalise the players that drop, though.
But: Players who drop regularly from a team would eventually be identified by the TrueSkill ranking system as having a negative impact on the team skill and will eventually be matched with other players of that have a negative team impact. So, you should not see this happening to often if you are a player of average skill.
Q: You are saying that the TrueSkill ranking system assumes that the skill of a team is the sum of the skills of its players. I think this model is not appropriate: I am usually playing much better with people from my friends list rather than with random players. Will this assumption lead to incorrect rankings?
A: The assumption that the team skill is the sum of the skills of its players is exactly that: an assumption. The TrueSkill ranking system will use the assumption to adopt the skill points of individual players such that the team outcome can be best predicted based on the additive assumptions of the skills. Provided that you and your friends also play team games with other players now and then, the TrueSkill ranking system will assign you a skill belief that is somewhere between the skill when you are playing with your friends and the skill when you are playing as an individual. So, in the worst case, every other game is not with your friends: then you are slightly ranked too high when you play with random team players and slightly ranked too low when you play with your friends. But, if you mostly play with your friends only the system will identify your skill correctly for most of your games.
Q: Why can two players in a party not be in two different teams?
A: This would open the possibility to cheat. You could, for example, arrange to play each other and your friend always forfeits the game. This would not allow to boost you to the top of the league (try it out with our advanced interactive ranking calculator; press the After->Before button and Recalculate) but it would increase your skill level artificially. The TrueSkill ranking system always assumes that the game outcome is a result of your skills (in the game) and not of your skills to cheat.
Q: Does the TrueSkill ranking system reward individual players in a team game?
A: The only information the TrueSkill ranking system will process is:
- Which team won?
- Who were the members of the participating teams?
The TrueSkill ranking system takes neither the underlying exact scores (flag captures, kills, time etc.) for each team into account nor which particular team member performed how well. As a consequence, the only way players can influence their skill updates is by promoting the probability that their team wins. Hence, "ball bitches", "hill whores", "flag fruits", "territory twits", and "bomb bastards" will hurt their individual TrueSkill ranks unless what they are doing helps their team. Obviously, it is difficult to update individual players' skills from team results only. To understand the difficulty and the solution consider the following analogy: Suppose you have four objects (players), each having an unknown weight (skill). Suppose further that you have a balance scale (game) to measure weight (skill) but are always only allowed to put two objects on each side of the balance. If you always combine the same pair of objects, the only information you can get is which pair of objects is heavier. But if you recombine the players into different pairs you can find out about their individual skills. As a consequence, the TrueSkill ranking system will be able to find out about individual players' skills from team outcomes given that players not only play in one and the same team all the time but in varying team combinations.
Q: I bought a 360 for my son for Xmas, and both of us have become seriously addicted to Halo 3 on XBox Live, particularly Team Slayer matches. Basing the skill change only on the team performance yields pretty counterintuitive results. For example, I often play a string of team slayer games where I am MVP (Most Valuable Player), which means I outscore everyone. But if my team loses those games, I gain no skill. Then, I can play poorly, but if my team wins I gain skill. This lack of feedback from individual performance is frustrating and makes your skill level beholden to the performance of the rest of your team, which is usually not under your control unless you explicitly team up with friends
A: Great that you are enjoying your 360 and Halo 3.
The question you are asking has indeed been raised by quite a few people and we had many discussions about it. However, we always return to our point of view that in a team game the only way to assess someone's skill towards the team objective is to consider the team objective only. Any auxiliary measurements such as number of flags carried, number of kills, kill-death spread, etc, all have the problem that they can be exploited thereby compromising the team objective and hence the spirit of the game. If flag carries matter, players will rush to the flag rather than defend their teammates or their own flag. Some may even kill the current flag carrier of their own team to get the flag. If it is number of kills, people will mindlessly enter combat to maximise that metric. If it is K-D spread they may hold back at a time when they could have saved a team mate. Whichever metric you take, there will be people trying to optimise their score under that metric and this will lead to distortions.
Another problem is, of course, that we would like to use the skill ratings for matchmaking. The current system essentially aims at a 50:50 win loss ratio for each team. It is unclear, how individual skill ratings based on individual achievements would change the calibration of such a system.
Of course, one might use a weighted combination of team and individual measurements. However, whenever individual measurements enter the equation there will be trouble, maybe less trouble if the weight is less, but that is not really good enough.
Q: If the skill of every player is represented by two numbers, how is it possible to rank players in a leaderboard?
A: The TrueSkill ranking system uses the so-called conservative skill estimate which is the 1% quantile of the belief distribution: it is extremely likely (to be precise, with a belief of 99%) that the player's actual skill is higher than the conservative estimate. Have a look in the detailed description.
Q: How can I become the top player in a leaderboard?
A: It's simple: Win games! The TrueSkill ranking system matches you with people of similar skill so winning against them will always bring you up the leaderboard.
But, more seriously, in order to become a level 50 player in an 8-player-Free-for-All game mode you will only need to win 8 (tightly) match-made games in a row! If you do not believe that, try out our interactive rank calculator: Always make sure that Alice wins the game and all other 7 players have the same μ but a σ of 1 (you can use the After->Before button). After 8 games you should see that Alice's μ is 56.995 and the σ is 1.901; hence, the conservative skill estimate would be Level 51! It may take a bit longer in reality because in these calculations it was assumed that there are always enough players at every playing strength available.
Q: Who is the better player: Someone with a large μ and a large σ or a small μ and a small σ?
A: The answer to this question is not straightforward. For someone with a large σ the TrueSkill ranking system is still uncertain about the skill. Thus, the player with the large μ and a large σ may be better. The best way to find out is to ask the player with the large σ to play more.
Q: I am a level 30 player with a σ of 5 and my friend is a level 28 player with a σ of 2? Why does the TrueSkill ranking system claim that my friend is better; at the end of the day, my level is higher?
A: That is correct. But, you have not played enough games yet for the TrueSkill ranking system to confidently know that you are better; so conservatively speaking, your level is probably 15 = 30 - 3 * 5 whereas your friend's conservative estimate is level 22 = 28 - 3 * 2.
Q: A couple of days ago I managed to get into the top 350 (in PGR 3 online career) after winning probably 25 of 30 races and that brought me up about 120 spots. Now tonight I have had 5 races: 2 wins,1 second,5th (got spun twice) and a 4th on one of the Vegas tracks. Because of this pathetic record (that is how the TrueSkill formula sees it) I have gone down 115 spots. How is it fair that 2 bad races basically dropped me down almost as many points as 25 wins out of 30 races took to gain all those places ?
A: There are two reasons that can cause this problem (although the latter is probably more responsible for this "phenomenon"):
- Ranks displayed in PGR 3 are the position in the total leaderboard. That means, if you are rank 659 then there are 658 gamers with a higher skill (estimate) than you. This number can vary without a gamer actually having to play a game; for example, if some (legitimate) "Gotham star" gets to the top 100 players in the world whilst you are not even racing, then your rank goes down to "660" without you doing anything wrong. This "rank" can never be guaranteed to be "stable".
- Roughly speaking, the change in your skill estimate depends on how "surprising" the game outcome is. If you happen to be (among) the player(s) with the highest skill in each of the games you played, then the 25 wins were not surprising and hence none of these games provided a significant increase in your skill estimate. However, if coming 5th was a rather unlikely outcome in the game were you actually did come fifth, then your skill needs to be adapted significantly. Another way of seeing the issue is that TrueSkill does take the strength of the opposition into account. One cannot simply compute the win ratio and equate this with skill; if all wins happen in the (sometimes) unavoidable unbalanced games then a win is not really testament to your (even) high(er) skill!
Q: Well there must be a bug in the system because I jumped into a 4 person race with 3 lower ranked individuals, won the race and my position in the league I was in dropped about 50 spots.
So, what is going on here? Between any two games of a gamer, the TrueSkill ranking system assumes that the true skill of a gamer, that is, μ, can have changed slightly either up or down; this property is what allows the ranking system to adapt to a change in the skill of a gamer. Technically, this is achieved by a small increase in the σ of each participating gamer before the game outcome is incorporated. Usually, a game outcome provides enough pieces of information to reduce this increased uncertainty. But, in a badly matched game (as the one described above) this is not the case; in this case, nothing can be learned about the winner from the game outcome (because it was already known before the game that the winner was significantly higher ranked than the other gamers he has beaten). So, conservatively speaking, the winner's skill might have slightly decreased! Note that this can only happen if the gamer is not matched correctly so that he can "prove" to the TrueSkill ranking system that his skill has not changed.
Q: In Dawn of War II, I won a game and went down in TrueSkill. What happened?
A: Usually your TrueSkill rises after a win – however, in Dawn of War II the displayed TrueSkill lags behind one game. (Thanks to CheeseNought for reporting the problem)
Q: Is it at all possible to view the TrueSkill rating of an individual Xbox Live Gamertag? Is there a website that I can go to, to see the ratings of people's
A: Most Xbox 360 games have a leaderboard function where you can find your TrueSkill; in fact, starting May 2006 some games have also provided web access to gamers' TrueSkill rating. However, there are a few exceptions, most notably with the game Call of Duty 2. At the moment, there is no way to find out about your TrueSkill in this game.
Q: My favourite game mode is Online Career in Project Gotham Racing 3. How can the TrueSkill ranking system find players of similar skill based on the chance of drawing when it is impossible to draw with someone else in a racing game?
A: When the TrueSkill ranking system computes the match quality of other players, it computes the (hypothetical) probability of draw between you and every other player relative to the probability of drawing between two equally skilled players; this ensures that the ratio is always between 0 and 1. This number would depend on the draw margin and thus the match-quality criterion of the TrueSkill ranking system is actually computing this ratio in the limit of a draw margin of zero! This gives the match quality equation specified in the detailed description.
In other words: The TrueSkill ranking system is not taking into account the chance of drawing for a given game mode! Thus, it does not matter that your game mode has zero chance of drawing.
Q: I am playing my first ranked game in a game mode. Will I be matched more likely with another player new to the game mode or with someone else?
A: When you play your first ranked game in a game mode, the TrueSkill ranking system assigns you a mean skill level μ in the middle of the leaderboard but a maximal variance σ2 of skills; it's your first game so the ranking system should reflect its lack of knowledge. Now, the TrueSkill ranking matchmaking criterion takes its maximal value for other players with the same mean skill level μ but a small variance σ2. Thus, if available, you will be matched with another player in the middle of the leaderboard but with a much smaller σ2: a player of established average skill.
Why is this better than matching you with someone else new to the game? Well, this other player may, in fact, be one of the most skilled players (who just happened not to have played the game mode yet) whereas you really are a beginner. Then, you two are (up to) 50 skill levels apart. Matching you with someone who is an established average player guarantees that your skill level gap is never bigger than 25 levels.
Q: I have been playing my first game in PGR3 online career last night. I was matched with a couple of Level 22/Contender players. That does not seem right, what's going on here?
A: The rank that is displayed in the PGR 3 online career lobby is "the conservative skill estimate"; with a chance of 99% your skill is larger than this number. More specifically, the rank is computed by "mean skill - 3 * uncertainty" but, as far as TrueSkill is concerned, your skill is anywhere between "mean - 3 * uncertainty" and "mean + 3 * uncertainty". So, when you are displayed as "Unranked", your mean skill is really 25 and the uncertainty is so large that your skill can be anywhere between 0 and 50. However, in matchmaking you get matched with people based on your "mean skill". Hence you will see large gaps in the matchmaking lobby. That does not mean you are matched badly, though. You are matched as well as it is possible given the information that TrueSkill has about you and in light of all the lobbies that are available to join when you request it.
Q: In PGR3, I am having a hard time understanding why I (novice level 12) consistently get matched with players in mid to high 20's. Yesterday I had to race a 29, 22, and a 17. And that is just the one example. It seems that the range for matching part is a little too liberal.
A: There are several effects that can lead to your observation:
- There are not enough players around for the TrueSkill system to choose from at the moment when you are searching for a new game.
- If you have not played enough games (that is, the uncertainty that TrueSkill has in your skill is still large) then you conservative skill estimate as shown by PGR3 is exactly this: a conservative skill estimate. In other words, your displayed level 12 could be anything from, let's say, level 12 to level 28.
- If you skill is too large or too small, there are usually far less players of this skill range (see answer to next question). However, this is probably not the case for level 12.
One last note: Rest assured that once there are enough active Live players around in your preferred game mode, the matchmaking will become much tighter. Also, the skill learning is not affected by a bad match; in fact, if you are matched with much stronger players there is nothing to lose with respect to your TrueSkill skill; the best thing that can happen is that you pull off a win and move up the skill leaderboard by a large amount.
Q: I am among the top 100 players in the world in my game mode. Why do I usually wait longer in the matchmaking lobby than my friend JoeDoe who is an average skill player?
A: This has an easy explanation: There are simply not enough players of your calibre available at any time! Remember that Xbox Live is a worldwide service, so there are perhaps only 1000 players that would be a perfect match for you. Living in 24 different time zones. The only alternative is to match you with players who are much less skilled and sacrifice match quality for waiting time. And this would ruin both their and your experience on Xbox Live. You see: being a top player has its price!
For example, on the right hand side you see a plot of the distribution of the mean skill levels μ for a popular Xbox Live game. As you can see, there are very few players of skill level 40 and above and 5 and below so the chance that an arbitrary other player online at the moment is a good match is much smaller. This results in the longer waiting time.
Q: I am a player with a mean skill of 30 and a skill variance σ2 of 4 but my friend is only a player with a mean skill of 10 and a skill variance σ2 of 2. If we play as a party, what people will we be matched with?
A: If you play as a party, the mean skill of every party member will be the average of all the mean skills and the skill variance is the average of the skill variances of all party members. Thus, for the purpose of matchmaking only, your mean skill will be 20 and your skill variance will be 3; the same is true for your friend. Hence, together you make a team of skill 40 = 2 * 20 with a joint skill variance of 6 = 4 + 2. But, when you finish a game the update will use your actual mean skill and skill variance; thus, your mean skill will grow/shrink faster (why?) depending on the outcome of the game.
Q: I keep getting matched with people of higher TrueSkill and losing badly, which is very frustrating. Why does this happen?
A: There are several effects that may be at work here:
- There is an inherent conflict between waiting time for a match and match quality: in a real-time system, the longer we wait during matchmaking, the higher the chances to find a tight matching player.
- The TrueSkill matchmaking support that is currently available for games on Xbox Live is based on a host-client model: During the matchmaking process, a player decides to either host a session (“host”) or search & possibly join a session (“client”). Note that this decision is either put in the hands of gamers (such as in Call of Duty 2) or automatically done behind the scenes (such as in Halo 3). TrueSkill comes into play during the search of a session insofar as the list of returned hosts is always sorted in decreasing order of the match quality. However, no filtering is done on the match quality and no constraints are made to pick the session at the top of the list. Thus, in off-peak hours or in situations where there are not enough host sessions available, the match quality can suffer and it may happen that you are getting matched with people of much higher/lower TrueSkill.
- The match quality is effectively measuring how far players are apart in terms of their mean skill level μ – however, the TrueSkill that gets displayed during matchmaking is the conservative skill estimate μ – 3*σ. Thus, the mismatch in terms of conservative skill estimates might look a lot worse than the actual mismatch. Here is an example:
- A game between a new player and an established level 25 player: The match quality is 57.6% though the displayed skill difference is a staggering 23 levels!
- A game between a new player and an established bad player: The match quality is 5.7% though the displayed skill difference is only 1 level.
- Note also that the system can learn a lot more about the skill of a new player in setting 1 than 2 (both in terms of the mean skill level μ and skill uncertainty σ).
Q: Can the TrueSkill ranking system cope with handicapped games?
A: No. Among other things, this is something we are working on right now. The TrueSkill ranking system assumes that two equally skilled teams have the same chance of winning.
Q: Can the TrueSkill ranking system identify cheaters?
A: No. The only thing the TrueSkill ranking system can do is to track the plausibility of game outcomes. If you happen to play a lot of games whose outcomes are not very plausible, then this could raise concerns about you. But it could also mean that you are a very adaptive player whose skill is growing faster than the TrueSkill ranking system anticipated. And the last thing you want to be called then is a cheater!
Q: Can you please extend the rank calculator to more than 8 players? We are running a league and we would like to use TrueSkill to rank players on results based on player matches between our members. Some Xbox 360 games allow more than 8 players in non-ranked matches.
A: No, we have currently no plans to extend the rank calculator to more than 8 players; the user interface becomes significantly more difficult and would need a complete re-design to cope with this much information. We may revise this decision and we will let you know.
However, if you only want to rank 2 team games, we have a solution for you already: We have a Microsoft Excel spreadsheet which can compute the update for up to 16 vs. 16 player matches using the rank calculator exploiting the property that the skill of a team is the sum of the skills of its players (once there is an Xbox 360 game with more than 32 players we will update the spreadsheet to cope with even bigger teams). Here are the steps to follow:
- Put the μ and σ values of all (up to 32) players into the cells C6 to F21.
- Put the draw probability that you would like to use into cell C23.
- Put the 5 numbers in cells D32 to D36 into the rank calculator (Head-to-Head scenario).
- Press Recalculate Skill Level Distribution.
- Copy the 4 numbers on the right of rank calculator into the cells E32 to E35.
- Now you can read off the new μ and σ values of all players from cells H6 to K21.
The accuracy is only up to 3 digits but that should be sufficient for up to 1,000 players. If you intend to rank bigger leagues, please contact us directly at firstname.lastname@example.org.
Q: I am interested to study ranking systems. Do you have any real-world data for a comparative analysis?
A: Yes, we recently released the Halo 2 Beta game outcome dataset. We are very interested in your work and would be interested to learn about your result; please feel free to contact us at email@example.com with any findings you have.
Q: I am a software developer and am eager to develop a small application that mimics your TrueSkill Rank Calculator. Would it possible for you to provide me with an implementation of that application (since it was meant for research purposes, I do not see the harm) or at least pseudocode for its implementation?
A: We do not intend to make available the source code of the TrueSkill Rank Calculator in the near future. Of course, we would like to encourage you to pursue research in the subject area so here is a list of pointers that might be of help (this list will be regularly updated if new material can be released):