We consider a multi-round auction setting motivated by pay-per-click auctions for Internet advertising. In each round the auctioneer selects an advertiser and shows her ad, which is then either clicked or not. An advertiser derives value from clicks; the value of a click is her private information. Initially, neither the auctioneer nor the advertisers have any information about the likelihood of clicks on the advertisements. The auctioneer's goal is to design a (dominant strategies) truthful mechanism that (approximately) maximizes the social welfare.

If the advertisers bid their true private values, our problem is equivalent to
the **multi-armed bandit problem**, and thus can be viewed as a strategic
version of the latter. In particular, for both problems the quality of an
algorithm can be characterized by **regret**, the difference in social welfare
between the algorithm and the benchmark which always selects the same “best"
advertisement. We investigate how the design of multi-armed bandit algorithms is
affected by the restriction that the resulting mechanism must be truthful. We
find that truthful mechanisms have certain strong structural properties –
essentially, they must separate exploration from exploitation – **and** they
incur much higher regret than the optimal multi-armed bandit algorithms.
Moreover, we provide a truthful mechanism which (essentially) matches our lower
bound on regret.