Share this page
Share this page E-mail this page Print this page RSS feeds
Home > Projects > Multi-Armed Bandits
Multi-Armed Bandits

This is an umbrella project for several related efforts at Microsoft Research Silicon Valley that address various Multi-Armed Bandit (MAB) formulations motivated by web search and ad placement. The MAB problem is a classical paradigm in Machine Learning in which an online algorithm chooses from a set of strategies in a sequence of trials so as to maximize the total payoff of the chosen strategies.

The name "multi-armed bandits" comes from a whimsical scenario in which a gambler faces several slot machines, a.k.a. "one-armed bandits", that look identical at first but produce different expected winnings. The crucial issue here is the trade-off between acquiring new information (exploration) and capitalizing on the information available so far (exploitation). While the MAB problems have been studied extensively in Machine Learning, Operations Research and Economics, many exciting questions are open. One aspect that we are particularly interested in concerns modeling and efficiently using various types of side information that may be available to the algorithm.

External visitors and collaborators

Prof. Robert Kleinberg (Cornell)
Prof. Eli Upfal (Brown)
Yogi Sharma (Cornell; Alex Slivkins' intern, summer'08)
Umar Syed (Princeton; Nina Mishra's intern, summer'08) 

Current and past MAB projects

  • Adapting to a stochastically changing environment
    Alex Slivkins and Eli Upfal  (COLT'08 pdf)
    We study a version of the stochastic MAB problem in which the expected reward of each arm evolves stochastically and gradually in time, following an independent Brownian motion or a similar process. Our benchmark is a hypothetical policy that chooses the best arm in each round.
  • MAB and experts problems in metric spaces
    • Robert Kleinberg, Alex Slivkins and Eli Upfal (STOC'08 pdf)
      We introduce a version of the stochastic MAB problem, possibly with a very large set of arms, in which the expected payoffs obey a Lipschitz condition with respect to a given metric space. The goal is to minimize regret as a function of time, both in the worst case and for 'nice' problem instances.
    • Regret dichotomies: Robert Kleinberg and Alex Slivkins (SODA'10 pdf)
      We focus on the connections between online learning and metric topology. The main result that the worst-case regret is either O(log t) or at least sqrt{t}, depending on whether the completion of the metric space is compact and countable. We prove a number of other dichotomy-style results, and extend them to the full-feedback (experts) version.
    • Contextual MAB with similarity info: Alex Slivkins (2009 pdf)
      In the 'contextual bandits' setting, in each round nature reveals a 'context' x, algorithm chooses an 'arm' y, and the expected payoff is µ(x,y). Similarity info is expressed by a metric space over the (x,y) pairs such that µ is a Lipschitz function. Our algorithms are based on adaptive (rather than uniform) partitions of the metric space which are adjusted to the popular and high-payoff regions.
    • Contextual & adversarial MAB: Ittai Abraham and Alex Slivkins (2009) 
  • Truthful MAB mechanisms
    Moshe Babaioff, Alex Slivkins and Yogi Sharma (EC'09 pdf)
    We consider a multi-round auction setting motivated by pay-per-click auctions in the Internet advertising, which can be viewed as a strategic version of the MAB problem. We investigate how the design of MAB algorithms is affected by the restriction of truthfulness. We show striking differences in terms of both the structure of an algorithm and its regret. 
  • Using MAB algorithms to improve web search results
    • Shifting Intent: Umar Syed, Alex Slivkins and Nina Mishra (NIPS'09)
    • Sreenivas Gollapudi, Filip Radlinski and Alex Slivkins (2009)
Publications