## Representative Publications (for full list click here)

We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem.
In this problem, the learner repeatedly makes an action on the basis of a context and receives reward for the chosen action,
with the goal of achieving reward competitive with a large class of policies. We analyze two settings: i) in the transductive
setting the learner knows the set of contexts a priori, ii) in the small separator setting, there exists a small set of contexts
such that any two policies behave differently in one of the contexts in the set. Our algorithms fall into the follow the perturbed
leader family (Kalai-Vampala,2005) and achieve regret O(T^{3/4}\sqrt{K\log(N)}) in the transductive setting and O(T^{2/3} d^{3/4} K\sqrt{\log(N)})
in the separator setting, where K is the number of actions, N is the number of baseline policies, and d is the size of the separator.
We actually solve the more general adversarial contextual semi-bandit linear optimization problem, whilst in the full
information setting we address the even more general contextual combinatorial optimization. We provide several extensions
and implications of our algorithms, such as switching regret and efficient learning with predictable sequences.

We present an analysis framework for bounding the price of anarchy (POA) in games that have many players, as in many of the games most pertinent to computer science applications. We use this framework to demonstrate that, in many of the models in which the POA has been studied, the POA in large games is much smaller than the worst-case bound. Our framework also differentiates between mechanisms with similar worst-case performance, such as simultaneous uniform-price auctions and greedy combinatorial auctions, thereby providing new insights about which mechanisms are likely to perform well in realistic settings.

**Abstract.** We study the quality of outcomes in repeated games when the population of players is dynamically changing, and where participants use learning algorithms to adapt to the changing environment. Game theory classically considers Nash equilibria of one-shot games, while many games are played repeatedly, and in such games players often use algorithmic tools to learn to play in the given environment.
Games of this flavor include packet routing and Internet ad-auctions.

In this paper we analyze efficiency of repeated games in dynamically changing environments. An important trait of
learning behavior is its versatility to changing environments, assuming that the learning method used is adaptive, i.e.,
doesn't rely too heavily on experience from the distant past. We show that, in large classes of games, if players c
hoose their strategies in a way that guarantees low adaptive regret, high social welfare is
ensured, even under very frequent changes. This result extends previous work showing high welfare in stable environments.
The Price of anarchy has originally been introduced to study the loss of efficiency of
Nash equilibria of one-shot games. More recently, many of the results on the price of anarchy have been extended to a repeated game settings,
assuming players use no-regret learning methods, and play in a steady environment and with a stable player population. However, typical online environments are rarely stable.

A main technical tool for our analysis is the existence of a solution to the welfare maximization problem that is both close to optimal and relatively stable over time.
Such a solution serves as a benchmark in the efficiency analysis of learning outcomes. We show that such a stable and close to optimal solution exists for many problems,
even in cases when the exact optimal solution can be very unstable.

We demonstrate our techniques by focusing on two classes of games as examples: independent item auctions and congestion
games. In both applications we show that adaptive learning guarantees high social welfare even with surprisingly high churn in the player population.

**Abstract.** The main goal of this paper is to develop a theory of inference of player valuations from observed data in the generalized second price auction without relying on the Nash equilibrium assumption. Existing work in Economics on inferring agent values from data relies on the assumption that all participant strategies are best responses of the observed play of other players, i.e. they constitute a Nash equilibrium. In this paper, we show how to perform inference relying on a weaker assumption instead: assuming that players are using some form of no-regret learning. Learning outcomes emerged in recent years as an attractive alternative to Nash equilibrium in analyzing game outcomes, modeling players who haven't reached a stable equilibrium, but rather use algorithmic learning, aiming to learn the best way to play from previous observations. In this paper we show how to infer values of players who use algorithmic learning strategies. Such inference is an important
first step before we move to testing any learning theoretic behavioral model on auction data. We apply our techniques to a dataset from Microsoft's sponsored search ad auction system.

**Abstract.** We show that natural classes of regularized learning algorithms with a form of recency bias achieve faster convergence rates
to approximate efficiency and to correlated equilibria in multiplayer normal form games. When each player in a game uses an
algorithm from our class, their individual regret decays at O(T^{-3/4}), while the sum of utilities converges to an approximate
optimum at O(T^{-1})--an improvement upon the worst case O(T^{-1/2}) rates. We show a black-box reduction for any algorithm in
the class to achieve O(T^{-1/2}) rates against an adversary, while maintaining the faster rates against algorithms in the class.
Our results extend those of Rakhlin and Shridharan 2013 and Daskalakis et al. 2014,
who only analyzed two-player zero-sum games for specific algorithms.

**Abstract.** Individual decision-makers consume information revealed by the previous decision makers, and produce information that may help in
future decision makers. This phenomena is common in a wide range of scenarios in the Internet economy, as well as elsewhere, such as medical decisions.
Each decision maker when required to select an action, would individually prefer to exploit, select the highest expected reward action given her information. At the same time, each decision maker would prefer previous decision makes to explore, producing information about the rewards of various actions. A social planner, by means of carefully designed information disclosure, can incentivize the agents to balance the exploration and exploitation, and maximize social welfare.

We formulate this problem as a multi-arm bandit problem (and various generalizations thereof) under incentive-compatibility
constraints induced by agents' Bayesian priors. We design an incentive-compatible bandit algorithm for the social planner with asymptotically optimal regret. Further, we provide a black-box reduction from an arbitrary multi-arm bandit algorithm to an incentive-compatible one, with only a constant multiplicative increase in regret. This reduction works for very general bandit setting that incorporate contexts and arbitrary partial feedback.

**Abstract.** We consider common-value hybrid auctions among two asymmetrically informed bidders, where the winning bidder pays his bid with some positive probability p and the losing bid otherwise. Under the assumption of discrete and affiliated signals, we give an explicit characterization of the (unique) equilibrium, based on a simple recurrence relation, which gives rise to a linear-time algorithm for explicitly computing the equilibrium.
By analyzing the execution of the algorithm, we derive several insights about the
equilibrium structure. First, we show that equilibrium revenue is decreasing in p, and that the limit second-price equilibrium selected as
p->0 has highest revenue, in stark contrast to the revenue collapse of the second-price auction predicated by the trembling-hand
equilibrium selection of Abraham et al. We further show that the Linkage Principle can fail to hold even in a pure first-price auction with binary signals:
public revelation of a signal to both bidders may decrease the auctioneer's revenue. Lastly, we analyze the effects of public acquisition of additional
information on bidder utilities and exhibit cases in which both bidders strictly prefer for a specific bidder to receive additional information.

**Abstract.** Many websites encourage user participation via the use of virtual rewards like badges. While badges typically have no explicit value, they act as symbols of social status within a community. In this paper, we study how to design virtual incentive mechanisms that maximize total contributions made to a website when badges are only valued as a symbol of social status. We consider a game-theoretic model where users exert costly effort to make contributions and, in return, are awarded with badges. The value of a badge is determined endogenously by the number of users who earn an equal or higher badge;
as more users earn a particular badge, the value of that badge diminishes for all users.

We show that among all possible mechanisms for assigning
status-driven rewards, the optimal mechanism is a leaderboard with a cutoff: users that contribute less than a certain threshold receive nothing while the remaining are ranked by contribution. We next study the necessary features of approximately optimal mechanisms and find that approximate optimality is influenced by the convexity of status valuations, i.e. whether being ranked above more people has an increasing or decreasing marginal effect in a user's satisfaction. When status valuations are concave, any approximately optimal mechanism must contain a coarse status partition, i.e. a partition of users in status classes whose size will grow as the number of users grows. Conversely when status valuations are convex, we prove that fine partitioning, i.e. a partition of users in status classes whose size stays constant as the number of users grow, is necessary for approximate optimality.

**Abstract.** We initiate the study of efficient mechanism design with guaranteed
good properties even when players participate in multiple
mechanisms simultaneously or sequentially. We define the class of smooth mechanisms, related to smooth games defined by Roughgarden, that can be thought of as mechanisms that generate approximately market clearing prices.
We show that smooth mechanisms result in high quality outcome both in equilibrium and in learning outcomes in the full information setting, as well as in Bayesian equilibrium with uncertainty about participants.
Our main result is to show that smooth mechanisms compose well: smoothness locally at each mechanism implies global efficiency.

For mechanisms where good performance requires that bidders do not bid above their value, we identify the notion of a weakly smooth mechanism. Weakly smooth mechanisms, such as the Vickrey auction, are approximately efficient under the no-overbidding assumption, and the weak smoothness property is also maintained by composition.

In most of the paper we assume participants have quasi-linear valuations. We also extend some of our results to settings where participants have budget constraints.