The TrueSkill™ ranking system is a skill based ranking system for Xbox Live developed at Microsoft Research.
The TrueSkill ranking system is a skill based ranking system for Xbox Live developed at Microsoft Research. The purpose of a ranking system is to both identify and track the skills of gamers in a game (mode) in order to be able to match them into competitive matches. The TrueSkill ranking system only uses the final standings of all teams in a game in order to update the skill estimates (ranks) of all gamers playing in this game. Ranking systems have been proposed for many sports but possibly the most prominent ranking system in use today is ELO.
So, what is so special about the TrueSkill ranking system? In short, the biggest difference to other ranking systems is that in the TrueSkill ranking system skill is characterised by two numbers:
- The average skill of the gamer (μ in the picture).
- The degree of uncertainty in the gamer's skill (σ in the picture).
The ranking system maintains a belief in every gamer's skill using these two numbers. If the uncertainty is still high, the ranking system does not yet know exactly the skill of the gamer. In contrast, if the uncertainty is small, the ranking system has a strong belief that the skill of the gamer is close to the average skill.
On the right hand side, a belief curve of the TrueSkill ranking system is drawn. For example, the green area is the belief of the TrueSkill ranking system that the gamer has a skill between level 15 and 20.
Maintaining an uncertainty allows the system to make big changes to the skill estimates early on but small changes after a series of consistent games has been played. As a result, the TrueSkill ranking system can identify the skills of individual gamers from a very small number of games. The following table gives an idea of the average number of games per gamer that the system ideally needs to identify the skill level:
Number of Games per Gamer
|16 Players Free-For-All||
8 Players Free-For-All
|4 Players Free-For-All||
|2 Players Free-For-All||
|4 Teams/2 Players Per Team||
|4 Teams/4 Players Per Team||
|2 Teams/4 Players Per Team||
|2 Teams/8 Players Per Team||
The actual number of games per gamer can be up to three times higher depending on several factors such as the variation of the performance per game, the availability of well-matched opponents, the chance of a draw, etc. If you want to learn more about how these numbers are calculated and how the TrueSkill ranking system identifies players' skills, please read the Detailed Description of the TrueSkill™Ranking Algorithm or find out in the Frequently Asked Questions.
If you play a ranked game on Xbox Live, the TrueSkill ranking system will compare your individual skill (the numbers μ and σ) with the skills of all the game hosts for that game mode on Xbox Live and automatically match you with players with skill similar to your own. But how can this be done when every player's skill is represented by two numbers? The trick is to use the (hypothetical) chance of drawing with someone else: If you are likely to draw with another player then that player is a good match for you! Sounds simple? It is!
- Daniel Tarlow, Thore Graepel, and Tom Minka, Knowing what we don't know in NCAA Football ratings: Understanding and using structured uncertainty, in MIT Sloan Sports Analytics Conference, MIT Press, February 2014.
- Shengbo Guo, Scott Sanner, Thore Graepel, and Wray Buntine, Score-based Bayesian Skill Learning, in Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD-12), 2012.
- Thore Graepel, Joaquin Quinonero Candela, Thomas Borchert, and Ralf Herbrich, Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft’s Bing Search Engine, in Proceedings of the 27th International Conference on Machine Learning ICML 2010, Invited Applications Track (unreviewed, to appear), June 2010.
- Xinhua Zhang, Thore Graepel, and Ralf Herbrich, Bayesian Online Learning for Multi-Label and Multi-Variate Performance Measures, in Proceedings of the Thirteenth Conference on Artificial Intelligence and Statistics AISTATS 2010 (to appear), May 2010.
- Peter A. Flach, John Guiver, Mohammed J. Zaki, Sebastian Spiegler, Bruno Golenia, Ralf Herbrich, Thore Graepel, and Simon Price, Novel Tools To Streamline the Conference Review Process: Experiences from SIGKDD'09, in SIGKDD Explorations, vol. 11, no. (2), Association for Computing Machinery, Inc., December 2009.
- Pierre Dangauthier, Ralf Herbrich, Tom Minka, and Thore Graepel, TrueSkill Through Time: Revisiting the History of Chess, in Advances in Neural Information Processing Systems 20, MIT Press, 2008.
- Ralf Herbrich, Tom Minka, and Thore Graepel, TrueSkill(TM): A Bayesian Skill Rating System, in Advances in Neural Information Processing Systems 20, MIT Press, January 2007.
- Thore Graepel and Ralf Herbrich, Ranking and Matchmaking, in Game Developer Magazine, October 2006.
- Ralf Herbrich and Thore Graepel, TrueSkill(TM): A Bayesian Skill Rating System, no. MSR-TR-2006-80, 2006.