*
Quick Links|Home|Worldwide
Microsoft*
Search for


External Research & Programs

Beyond Search – Semantic Computing and Internet Economics Awards

Microsoft Research and Microsoft adCenter announced the 20 recipients of the Beyond Search – Semantic Computing and Internet Economics 2007 RFP awards, totaling $1,000,000 (USD) in funding. This RFP focuses on advancing academic research and publication in the area of Internet technologies and services particularly in semantic technologies to improve the ways in which the information seeker finds, shares, discovers information, and in online advertising to explore the technical, research, societal, and commercial issues around internet economics.

Beyond Search – Semantic Computing and Internet Economics RFP Award Recipients

Inferring Commercial Intent: Taxonomy Construction, Intent Classification, and Applications
Eugene Agichtein; Charles Clarke
Emory University, U.S.; University of Waterloo, Canada

Intent detection is one of the crucial long-standing goals of information access. This research proposes to automatically identify the user queries and behavior patterns associated with commercial intent. Specifically, the researchers will develop a taxonomy of commercial intent to classify user information needs along dimensions such as immediate vs. near-term purchase, or purchase vs. research. This taxonomy will be developed using a combination of user behavior mining, user surveys, and longitudinal tracking of user purchasing behavior. The researchers will also develop techniques to automatically infer the information needs from user actions such as queries, result and ad click through and browsing behavior to classify the session in the corresponding commercial intent category. The general intent inference algorithms developed will also be applicable to other (non-commercial) intent detection tasks, such as detection of scholarly research intent, health information seeking, and market research. Finally, the researchers will develop and evaluate the effectiveness of these techniques to improve practical applications such as ad ranking and content personalization. The findings and the resulting algorithms will help advertisers automatically create more appropriate and relevant ad content, develop better ranking ads by matching the tone, focus and the content of the ads with the users current intent, as well as contribute to the general understanding of user intent inference and web search behavior modeling.

Socially Structured User Behavior and Externalities in Sponsored Search Auctions
Sander Bohte, Nicole Immorlica, Vangelis Markakis, Han La Poutre
Center for Math and Computer Science, Netherlands

The proposed research addresses two independent aspects of ad allocation optimization: Personlization and Externality. As part of this proposal, the researchers will explore the behavioral relations between a user and his/her social network by deriving social structures from the query logs based on proper names. They will then use these structures to predict the types of advertisements that the user might find useful and appealing. The second part of the research will concentrate on refining the ad allocation model to account for the fact that the advertisement a user is likely to view and respond to, is influenced by the content of other, surrounding advertisements, so-called 'externalities.

Scalable Annotation Search and Aggregation of Semi structured Graph and Text Data Models
Soumen Chakrabarti, S. Sudarshan
Indian Institute of Technology Bombay, India

Search on unstructured data on the Web has received a lot of attention. However, there are numerous additional sources, including (a) diverse personal data: email, desktop files, contacts, (b) organizational data, often from disconnected islands in the organization: LDAP directory data, HR and other ERP data, and (c) Web data that is relevant to the organization, such as bibliographic data for a research organization, or stock market data for an investment fund. Today it is possible to independently query each of these information silos, however one cannot readily exploit rich connections between them. This research looks at (a) bridging diverse data sources, by adding semi-structured annotations to them and then creating probabilistic connections between the annotations; (b) developing query models that can search across data from multiple such sources by aggregating possibly uncertain information from multiple sources to get higher confidence; (c) developing efficient algorithms for executing queries under the models developed.

Using Query Logs to Associate Semantically Related News Stories
William Cohen
Carnegie Mellon University, U.S.

The long-term goal of this research is to link multiple related stories together temporally, and automatically construct a cohesive narrative that is easily accessible to the user, and that facilitates in-depth study of news on any topic. Current newsreaders are highly driven by recency of information: most of the user’s attention is directed towards events of the last few hours or even minutes. This phenomenon leads to a grasp of current events which is broad, but shallow. Many events, however, are better interpreted in context of previous related events. Understanding which past stories give the best context for an event is difficult, requiring many subtle judgments about relevance, entity identity, and so on. By aggregating information in query logs, this research aims at reconstructing links between related stories.

AdRules: Improving Quality of Ads
Krzysztof Dembczynski, Wojciech Kotlowski, Dawid Weiss
Poznan University of Technology, Poland

The goal of this study is to build a powerful prediction model for estimating the likelihood that a user will act in response to a specific advertisement. This work builds on previous research by designing and constructing a strong model capable of handling complex interactions between attributes yet still explicit and comprehensive for its users (both the search engine and the advertisers). The researchers hope to achieve this goal by using an ensemble of decision rules. Assuming the built model is comprehensive one can use it for making general recommendations for improving the quality of ads, but also specific recommendations for advertisers—recommendations that are sensitive to the context of their particular ad and potential queries matching this ad.

Modeling Trust Influence and Bias in Social Media
Tim Finin, Anupam Joshi
University of Maryland, Baltimore County, U.S.

Web-based social media systems such as blogs, Wikis and forums are an important new way to publish information, engage in discussions and form communities on the Internet. Since approximately one third to one half of all new Web content is generated by social media systems their reach and impact is significant. This research seeks at better understanding on the following questions: How can blog communities be identified based on a combination of topic, bias, and underlying beliefs? Which authors and blogs are most influential within a given community? From where do particular beliefs or ideas originate and how do they spread? What are the most trustworthy sources of information about a particular topic? What opinions and beliefs characterize a community and how do these opinions change?

Incentives and Algorithms for Robust and Responsive Reputation Systems: a Model Driven Approach
Ashish Goel, Rajeev Motwani
Stanford University, U.S.

With the emergence of Internet commerce, traditional industries like retail sales, service sectors like travel, advertising, media distribution, have all undergone major changes. The new systems utilize complex models, which are not yet adequately understood. Since these systems are automated, incentives and algorithmic techniques are crucial to realize their full potential. This project will study one specific aspect of Internet Commerce -- robust and responsive reputation/recommendation systems. The researcher’s preliminary progress in this area is very promising. However, it is unclear how a reputation system for, say, web-ranking can be evaluated formally, since there are no well-accepted models for human behavior given a ranked list of items to choose from. The researchers have developed preliminary models that they believe are both accurate and frugal, and have made a lot of preliminary progress in developing robust reputation/recommendation systems on top of these models. The researchers will use the Microsoft adCenter Query Logs to validate and refine their models and algorithms. The released log data will have a significant impact in shaping future research in this area.

Web-scale Semantic Social Mash-Ups with Provenance
Harry Halpin, Henry Thompson
University of Edinburgh, United Kingdom

As the Web grows ever larger and increasing amounts of data are available on the Web, how can users access and combine data from multiple sources, to discover particular information about a particular entity, such as a person, place, or organization. This proposal focuses on doing mash-ups based on semantics, by giving overt semantics to data using a number of common formats such as vCard and OpenID and then by doing the 'mash-up' based on this semantics. By using a functional framework with tight ties to a formal logic via the Curry-Howard Isomorphism, provenance-tracking can be built into the very fabric of the mash-up itself. Users can use this provenance information to correct errors in the mash-up and see if they trust the results. This lets ordinary users, not only experts, create data with semantics and share this data with other users.

Equilibrium Computation and Semi-automated Mechanism Design for adCenter Auctions
Kevin Leyton-Brown
University of British Columbia, Canada

The proposed research will use a recently developed compact representation for game theory ('action-graph games') to compute equilibria of realistic ad auctions. The researchers will mine the released adCenter data logs to develop a realistic model. Given a game-theoretic model, it is possible to perform 'semi-automated mechanism design', a computational process that will recommend alternate auction designs that would yield improved performance according to a given benchmark. The researchers will freely release software for building and analyzing such models, and publish the research results.

Learning to Advertise in Sponsored Search: Relevance Ranking, Diversity, and Beyond
Ping Li
Cornell University, U.S.

The goal of this research is to learn ads relevance and ranking in sponsored search for improving user experience. The researchers propose (1) learning ads ranking from click-through data using state-of-the-art algorithms currently applied in (regular) Web search ranking including support vector machines (SVM) and ensemble boosted trees; (2) incorporating pairwise preference information from click-through history; (3) taking advantage of the huge volume of unlabeled (undisplayed) data under a semi-supervised learning paradigm including graph learning and collaborative filtering. The final goal of this research is to develop principled techniques to guide the positioning of ads for enhancing the probability of (at least) one click on sponsored links per displayed ad section. All tasks will lead to very large-scale machine learning problems, challenging and interesting for the own sake. The researchers expect to achieve part of the goals in a one-year time frame and afterwards we will continue the investigation using Microsoft data.

Using Markov Logic Networks to Infer User Intent for Search Queries
Raymond Mooney
University of Texas at Austin, U.S.

This research uses statistical relational learning, specifically, Markov Logic Networks (MLN), to learn to recognize user intent from web search queries. By combining relational knowledge representation with statistical learning methods, MLNs allow the automatic construction of expressive models for inferring user intent from prior session information and existing data from other users.

Exploiting a Large-Scale Commonsense Knowledge Base for Context-Driven Ad Placement
Sung Hyon Myaeng
Information & Communications University, Korea, South

In this proposal, the researchers attempt to address the problems and limitations of the topic and keyword based ad matching technology and move toward a pragmatics-based approach. For ad placement, it is crucial to understand the activities, tasks, and states that are pertinent to the authors or readers to infer what services or products would be of interest and in need. The main goal of this research is to develop a novel ad matching method that penetrates into the needs and intents of potential beneficiaries of ads, moving one step further beyond topical similarities. ConceptNet, a large-scale commonsense knowledge base constructed by the general public, is at the center of the proposed research to extract context-revealing entities from text, which in turn are to be linked to need taxonomy to which ads are mapped.

Two-sided Combinatorial Auction Mechanism for Sponsored Search
David Porter, Roumen Vragov, Vernon Smith
Chapman University, Baruch College, CUNY, George Mason University, U.S.

This proposal uses past theoretical advances in consumer search and auction theory to propose a two-sided combinatorial auction mechanism for sponsored search ads. Existing auction mechanisms either assume a constant user behavior and analyze advertiser’s bidding patterns, or assume static bids and analyze the utility and attractiveness of the ads for each user. This research will initialize experiments with economically motivated human subjects using parameters taken from the data available from Microsoft adCenter logs and test the mechanism along two dimensions: advertiser bidding behaviors and ad ranking methodology.

Integrating the Deep Web with the Shallow Web
Mike Stonebraker
Massachusetts Institute of Technology, U.S.

It is generally accepted that the deep web is significantly bigger than the shallow web. We define the deep web as that information that is available only by filling information into web forms. Current search systems are getting very good at answering queries directed to the shallow web. However, queries that can be answered only from the deep web, such as “What is the Amtrak fare from Boston to New York?” or “What is the flight status of UA 179?” typically require a user to know the URL of the site with the information. He must then access that site and fill in the required information. Giving such a query to a search engine instead is likely to yield only frustration to the user. The purpose of this research is to build a software system that can integrate the deep web with the shallow web, via semantics.

JunWang2.jpg

Goal-Driven Information Retrieval
Jun Wang
University College London, United Kingdom

Today's search technologies heavily rely on textual queries of users to identify their information needs. Yet, users have various information needs. To cope with this, this research proposes a goal-driven information retrieval framework, in which the retrieval processes and combination strategies are influenced by automatically learning users' information goals (i.e. types of user needs or retrieval tasks) and by estimating the probability of relevance between user information needs and documents. The research intends to establish a sequential approach to learning relevance by applying recent advances in Bayesian machine learning by explicitly modeling users’ various information goals, and, as a result, having the final algorithm respond to different information goals accordingly by a weighted average of different retrieval models; by counting the dependency of information needs among users, we will correlate users by applying hierarchical Bayesian methods.

Addressing and Identifying Privacy Leakage from Query Logs: An Accountability Approach
Daniel Weitzner, Lalana Kagal
Massachusetts Institute of Technology, U.S.

The use of query logs for studying search engines and for improving information retrieval on the Web is invaluable. This use, however, is currently restricted as it might sacrifice user privacy and expose significant amount of private and identifying information. We believe efforts to address information policy issues such as online privacy have been overly dominated by access restriction and privacy-preserving algorithms such as anonymization, generalization, and perturbation. An alternative is to emphasize the design of systems that provide greater information accountability as judged against rules governing appropriate use, rather than information security and access restriction. The goal of this proposal is to develop a technical proof-of-concept that can be the basis for monitoring privacy leakage in various query log research contexts.

Practical Strategic Analysis of Sponsored Search Auctions
Michael Wellman
University of Michigan, U.S.

The proposed study will develop valuation models, bidding strategies, and a simulation environment for evaluating advertising bidding strategies under a range of auction rules. Models will be based on adCenter log data. Using techniques from empirical game-theoretic analysis, the study will assess the performance bidding strategies under various auction rules (using adCenter as baseline), and produce recommendations for auction rules for specified objectives.

Extracting Query Semantics from Past Query Sequences to Support Exploratory Search
Mingfang Wu, Andrew Turpin, Simon Puglisi, Falk Scholer, James Thom
RMIT University, Melbourne, Australia

Identical query words issued by different users can have very different intentions. For example, the query “Microsoft” could be issued from a user interested in the stock price of the company, news releases, operating systems, or computer games. So while it is essential that an information retrieval system delivers search results that contain the query words, it is by no means a guarantee that the documents served will suit the information need. One way to obtain additional information about the possible semantics, or meaning, behind a query is the chain of reformulations of that query that have been performed by previous users who issued the same initial query. Such chains are readily identifiable in query logs, and contain not only reformulated queries, but clickthrough data related to those queries. This research looks at extending a chain model to: 1) intelligently increase the diversity of a search result through application of query semantics learnt from chains; and 2) devise a new search interface that allows visualization of information gathering and sense-making strategies used by other users who issued this (semantically similar) query.

Capturing User Intentions: Organize Information Footprints to Improve Exploratory Search
ChengXiang Zhai, Kevin Chang
University of Illinois at Urbana-Champaign, U.S.

While current search engines serve known-item search such as homepage finding very well, they generally cannot support exploratory search effectively. In exploratory search, users do not know their information needs precisely and also often lack the needed knowledge to formulate effective queries, thus querying alone, as supported by the current search engines, is insufficient, and browsing into related information would be very useful. This research proposes a novel navigation-based retrieval framework to unify querying and browsing and treat both as navigation over topic regions. To support browsing effectively, search logs are treated as ``footprints'' left by previous users in the information space and build a multi-resolution topic map to guide a user in navigating in the information space.

Collaborative Personalized Advertising
Yi Zhang
University of California Santa Cruz, U.S.

How to recommend advertisements relevant to a specific query issued by a specific user in a specific context is an important and challenging research problem. Even a small improvement in accuracy could lead to a big benefit for a search engine company. On the other hand, getting information when needed is one of the most desirable things for search engine users. The proposed project will tackle this challenge based on Personalized Collaborative Advertising. The goal of the proposed project is to advance the fundamental theory, investigate long term and short term practical techniques, and develop efficient tools for building a personalized advertising agent.

 

Beyond Search – Semantic Computing and Internet Economics RFP

 


©2008 Microsoft Corporation. All rights reserved. Terms of Use |Trademarks |Privacy Statement