In conjunction with the 36th Annual ACM SIGIR Conference (SIGIR 2013)

Workshop Description News & Announcements

Internet advertising, a form of advertising that utilizes the Internet to deliver marketing messages and attract customers, has seen exponential growth since its inception around twenty years ago; it has been pivotal to the success of the World Wide Web. The dramatic growth of internet advertising poses great challenges to information retrieval, machine learning, data mining and game theory, and it calls for novel technologies to be developed.

The main purpose of this workshop is to bring together researchers and practitioners in the area of Internet Advertising and enable them to share their latest research results, to express their opinions, and to discuss future directions.

Internet advertising is one of the most important monetization engines of the Internet. It is highly associated with many IR applications such as web portals, search engines, social networks, e-commerce, mobile networks and apps, etc. Researchers in information retrieval, machine learning, data mining, and game theory are developing creative ideas to advance the technologies in this area.

We look forward to your contribution and attendance! See you in Dublin!

Invited Talks

Invited Talk 1: Financial Methods in Computational Advertising

Dr. Jun Wang (University College London, U.K.)

Abstract: Computational Advertising has recently emerged as a new scientific sub-discipline, bridging the gap among the areas such as information retrieval, data mining, machine learning, economics, and game theory. In this tutorial, I shall present a number of challenging issues by analogy with financial markets. The key vision is that display opportunities are regarded as raw material “commodities” similar to petroleum and natural gas - for a particular ad campaign, the effectiveness (quality) of a display opportunity shouldn’t rely on where it is brought and whom it belongs, but it should depend on how good it will benefit the campaign (e.g., the underlying web users’ satisfactions or respond rates). With this vision in mind, I will go through the recently emerged real-time advertising, aka Real-Time Bidding (RTB), and provide the first empirical study of RTB on an operational ad exchange. We show that RTB, though suffering its own issue, has the potential of facilitating a unified and interconnected ad marketplace, making it one step closer to the properties in financial markets. At the latter part of this talk, I will talk about Programmatic Premium, i.e., a counterpart to RTB to make display opportunities in future time accessible. For that, I will present a new type of ad contracts, ad options, which have the right, but no obligation to purchase ads. With the option contracts, advertisers have increased certainty about their campaign costs, while publishers could raise the advertisers’ loyalty. I show that our proposed pricing model for the ad option is closely related to a special exotic option in finance that contains multiple underlying assets (multi-keywords) and is also multi-exercisable (multi-clicks). Experimental results on real advertising data verify our pricing model and demonstrate that advertising options can benefit both advertisers and search engines.

Bio: Jun Wang is Senior Lecturer (Associate Professor) in University College London and Founding Director of MSc/MRes Web Science and Big Data Analytics. His main research interests are in the areas of information retrieval, data mining and online advertising. He has recently studied financial methods for online advertising. Dr. Wang has published over 70 research papers in leading journals and conference proceedings. He was a recipient of the Beyond Search – Semantic Computing and Internet Economics award sponsored by Microsoft Research, USA in 2007; he also received the Best Doctoral Consortium award in ACM SIGIR06 for his unified theory of collaborative filtering, the Best Paper Prize in ECIR09 for his pioneer work on Portfolio Theory of Information Retrieval, and the Best Paper Prize in ECIR12 for top-k retrieval modelling. Dr. Wang obtained his PhD degree in Delft University of Technology, the Netherlands; MSc degree in National University of Singapore, Singapore; and Bachelor degree in Southeast University, Nanjing, China.

Invited Talk 2: Information Science versus Data Science for Digital Advertising and Marketing

Dr. James G. Shanahan (Independent Consultant, U.S.A.)

Bio: Jimi has spent the last 25 years developing and researching cutting-edge information management systems that harness machine learning, information retrieval, and linguistics. During the summer of 2007, he started a boutique consultancy (Church and Duncan Group Inc., in San Francisco) whose major goal is to help companies leverage their vast repositories of data using statistics, machine learning, optimisation theory and data mining for big data applications (billions of examples) in areas such as web search, local and mobile search, and digital advertising and marketing. Church and Duncan Group’s clients include Adobe, AT&T, Akamai, W3i,, eBay,,, TapJoy, and Along with architecting and developing large scale distributed statistical optimization systems for his clients, Jimi also leads and hires engineers and scientists, and provides business insights and strategic guidance to sales, analytics and business development groups within these organizations. In addition, Jimi has been affiliated with the University of California at Santa Cruz since 2009 where he teaches a sequence of graduate courses on big data analytics, machine learning, and stochastic optimization (TIM 206, ISM 209, ISM 250 and ISM251). He advises several high-tech startups (e.g.,, NativeX, InferSystems) and is executive VP of science and technology at Irish Innovation Center (IIC). He has served as a fact and expert witness.

Prior to founding Church and Duncan Group Inc., Jimi was Chief Scientist and executive team member at Turn Inc. (an online ad network that has recently morphed to a demand side platform). Prior to joining Turn, Jimi was Principal Research Scientist at Clairvoyance Corporation where he led the “Knowledge Discovery from Text” Group. In the late 1990s he was a Research Scientist at Xerox Research Center Europe (XRCE) where he co-founded Document Souls, an anticipatory information system, where documents were given personalities of information services that foraged the web to stay informed and informative. In the early 90s, he worked on the AI Team within the Mitsubishi Group in Tokyo.

He has published six books, over 50 research publications, and 15 patents in the areas of machine learning and information processing. Jimi chaired CIKM 2008 (Napa Valley), co-chaired International Conference in Weblog and Social Media (ICWSM) 2011 in Barcelona, and was PC co-chair of ICWSM 2012 (Dublin). He co-chaired the ISSDM Workshop on Knowledge Management: Analytics and Big Data at UC Santa Cruz. He has organized several workshops in digital advertising as part of SIGIR, NIPS and SIGKDD. He is regularly invited to give talks at international conferences and universities around the world. Jimi received his Ph.D. in engineering mathematics from the University of Bristol, U. K. and holds a Bachelor of Science degree from the University of Limerick, Ireland. He is a Marie Curie fellow and member of IEEE and ACM. In 2011 he was selected as a member of the Silicon Valley 50 (Top 50 Irish Americans in Technology).

Invited Talk 3: Beyond Bag-of-words: Machine Learning for Matching

Dr. Jun Xu (Huawei Noah’s Ark Lab, Hong Kong)

Abstract: Dealing with mismatch between query and ads (documents) is one of the most critical research problems in search. Recently researchers have spent significant effort to address the grand challenge. The major approach is to conduct more query and document understanding, and perform matching between enriched representations. With the availability of large amount of log data and advanced machine learning techniques, this becomes more feasible and significant progress has been made. In this talk, I will give a survey on newly developed machine learning technologies for matching. I will focus on the descriptions on the fundamental problems, as well as the novel solutions.

Bio: Jun Xu is Researcher at Noah’s Ark Lab, Huawei Technologies in Hong Kong. He received his PhD in computer science from Nankai University China in 2006. He worked at Microsoft Research Asia during 2006 and 2012. He joined Huawei Noah’s Ark Lab in 2012. Jun’s research interest focuses on information retrieval and web search. He has published extensively in prestigious conferences and journals including SIGIR, WWW, WSDM, JMLR, and TOIS etc. Jun is very active in the research communities and severed or is serving the top conferences and journals. He developed the learning to rank algorithms of IR-SVM and AdaRank, large scale topic models of RLSI and GMF. He released the source code of AdaRank, RLSI, and LETOR dataset to the academic.

Invited Talk 4: Semantically Related Bid Phrase Recommendation from Advertiser Web Pages

Dr. Sayan Pathak (Microsoft, India)

Abstract: Sponsored search systems such as Bing and Google has three players: advertisers, users and publishers. Users search for relevant information through search queries and advertisers bid for these user queries (bid keywords) to sell their services. As advertisers are not fully aware of the user queries, advertisers require keyword recommendation. And an important aspect of these keyword recommendation engines is to recommend relevant keywords for the given advertiser based on the advertiser’s context such as website, business category etc. Previous work in the area of keyword recommendation is based on mining advertisers website and other context; however, these approaches are limited by the keywords present on the advertisers website. We propose the problem of keyword recommendation as a large-scale multi-label learning task where labels are keywords. In this talk, we present a multi-label random forest formulation that associates each data point (advertiser) with relevant subset of labels (keywords) from the universe of labels (keywords). Proposed approach automatically generates training data for the classifier from click logs without any human annotation or intervention. As the training complexity is polynomial in the number of training points and prediction complexity is logarithmic in the number of training points, proposed approach can be used for large-scale applications. Large-scale experiments conducted on MapReduce with 50 million webpages and 10 million keywords extracted from Bing logs reveal significant gains in P@10 compared to previous ranking and NLP based techniques.

Bio: Sayan Pathak received BS from the Indian Institute of Technology, Kharagpur, India in 1994, MS and PhD degrees in Bioengineering from the University of Washington, Seattle, in 1996 band 2000, respectively. Currently, he is working at the Online Services Division of Microsoft as a principal program manager leading algorithm R&D in the Bing Ads team. Prior to moving into the online advertising space, he has been researching and integrating cutting edge machine learning technology into Microsoft products in collaboration with Microsoft Research Labs both as a developer and program manager. He has been a faculty at the University of Washington for past 10 years and has over 8 years of teaching experience. Prior to joining Microsoft, he worked at Allen Institute for Brain Science, University of Washington and has been a principal investigator on several US National Institutes of Health (NIH) grants. He has published in leading journals such as Nature, Nature Neuroscience, IEEE and PNAS in the areas of large scale machine learning, computer vision, classification, image and signal processing.

Invited Talk 5: Large Scale Search Algorithms for Advertising Application

Dr. Vanja Josifovski (Google, U.S.A.)

Abstract: Today's online advertising comes in many different formats with different technology used for ad selection: from simple hash lookups to recommender systems. In this talk we give an overview of the different ad selection problems and show how they can be mapped to large scale, high dimensional similarity search. Specifically we will overview the WAND algorithm which was proposed about a decade ago for high-dimensional similarity search and has thereafter been adapted for multiple search and data mining applications related to online advertising.

Bio: Vanja Josifovski is a Technical Lead in the Strategic Technologies group at Google where he is leading projects in the areas of recommender systems, information extraction and computational advertising. Prior to Google, Vanja was a Sr. Director of Research at Yahoo! Research where he worked on the next generation Contextual Advertising, Sponsored Search and Display Advertising Targeting platforms. Based on his work, Vanja co-taught a computational advertising course at Stanford University. Even earlier in his career, Vanja was a Research Staff Member at the IBM Almaden Research Center working on databases and enterprise search.

Invited Talk 6: Query-Ads Matching in Sponsored Search: Challenges and Solutions

Dr. Yunhua Hu (Alibaba Corporation, China)

Abstract: In sponsored search, usually all related Ads of a given query are matched by following a three-stage matching model. The model consists of query rewriting stage, Bid-Ads ranking stage, and Ads ranking stage. However, such a three-stage matching model is still not perfect for sponsored search. For instances, it is not easy to evaluate the accuracy of the whole query-ads matching process. It is also difficult to optimize each stage. It is even not clear whether there is still space to be improved. In this session, I will introduce the challenges we faced. I will also share some initial solutions for these challenges.

Bio: Yunhua Hu is a Senior Technical Specialist from Alimama business group in Alibaba Corporation. He focuses on query analysis based on big data, personalized search, matching and ranking algorithm in search advertising, etc. Before he joined in Alibaba, he had been worked in Natural Language Processing (NLP) group and Web Search and Mining(WSM) group in Microsoft Research Asia (MSRA) for more than 5 years. He conducted some advanced research on Enterprise Search, Academic Search, and log mining for Web Search. Some key technologies have been transferred into Microsoft Office and Bing search engine. He also published several papers in SIGIR, WSDM, CIKM, JCDL, and AAAI, etc.

Invited Paper

Adaptive Keywords Extraction with Contextual Bandits for Advertising on Parked Domains

Shuai Yuan (University College London, U.K.)

Jun Wang (University College London, U.K.)

Maurice van der Meer (B.V. DOT TK, Netherlands)

Abstract: Domain name registrars and URL shortener service providers place advertisements on the parked domains (Internet domain names which are not in service) in order to generate profits. As the web contents have been removed, it is critical to make sure the displayed ads are directly related to the intents of the visitors who have been directed to the parked domains. Because of the missing contents in these domains, it is non-trivial to generate the keywords to describe the previous contents and therefore the users intents. In this paper we discuss the adaptive keywords extraction problem and introduce an algorithm based on the BM25F term weighting and linear multi-armed bandits. We built a prototype over a production domain registration system and evaluated it using crowdsourcing in multiple iterations. The prototype is compared with other popular methods and is shown to be more effective.

Submission deadline
June 21, 2013, 11:30pm PDT

Submission Website