Microsoft Live Labs: Accelerating Search in Academic Research 2006 RFP Awards
Microsoft Research announced the twelve recipients of the Microsoft Live
Labs: Accelerating Search in Academic Research 2006 RFP awards, totaling $500,000 (USD) in funding.
The objective of this RFP is to support Live Labs’ collaboration with the
academic research community and is focused on the Internet Search research area.
Specifically, this RFP directly addresses the need for more large-scale data by
making additional real world search data available to academia. In doing so,
Microsoft seeks to further encourage academic research and innovation in search
by increasing the availability of relevant, large, and current data sets from
MSN Search, new data analysis and algorithm development in Internet Search will
be supported.
Microsoft Live Labs: Accelerating Search in Academic Research 2006 RFP Award Recipients
 VISP: Visualizing Information Search Processes
Lada A. Adamic, Suresh K. Bhavnani
University of Michigan, US
We propose to use the query logs and click-through data to analyze and visualize
the interaction between user behavior, distribution of content, and search engine
ranking. In particular, we will be analyzing the completeness of the information
retrieved by the search engine user, an important factor, for example, when the
query is health related. We will extract the most common health-related queries
and use both human experts and natural language processing to identify key facts
located on the Web pages returned by the search engine. We will then correlate the
search engine ranking with the completeness of information on the page. Our main
goal is to develop a visualization tool that will show the distribution of information
among the search results, the links between the results and the user click-throughs.
The visualization tool will both contribute to our understanding of information
seeking behavior and enable search engine developers and Web site designers to pinpoint
the difficulty users have in finding comprehensive information.
Vinegar: Leading Indicators in Query Logs
Eytan Adar, Brian Bershad, Steven Gribble, Daniel Weld
University of Washington, US
The flood of queries coming into a search engine represents a slice of the collective
consciousness of Internet users. Events in this stream, when properly detected and
aggregated, can be used to explain current happenings and generate leading indicators
to predict future events. We are working on Vinegar*, a system capable of analyzing
streams of search data, to find correlations and causal inferences. Our goal is
that Vinegar be able to accurately generate useful indicators in near real-time
through both automatic and manually-guided means. By analyzing search logs in conjunction
with other temporal information (such as news events or blog posts), we hope to understand
how query behavior is impacted by external events and, conversely, how aggregate
search behavior can be predictive of events and trends in other domains.
*The name
Vinegar comes from the observation that months before SARS hit the world newspapers,
and even before the disease was acknowledged by the larger Chinese medical community,
the affected population of the Guandong province in China began buying out supplies
of white vinegar, a local folk remedy.
Entity and Relation Types in Web Search: Annotation, Indexing and Scoring
Techniques
Soumen Chakrabarti
IIT Bombay, India
The goal of our proposed project is to dramatically improve the quality of
complex search and aggregation tasks over text and semi-structured data by
annotating and exploiting entities and relations. We will explore several means
to this end. First, we wish to devise algorithms which, guided by query log
analysis, will create and maintain catalogs of entities, attributes, and
relations. Second, we plan to unify and extend existing information extraction
and integration techniques for cross-site, cross-page annotations that combine
links, layout, and text. Third, we plan to design practical, compact and
efficient indexes that support queries combining keywords with structures in a
knowledge base or ontology. Fourth, we want to invent scoring functions that
span linear text, 2D layouts, and graphical knowledge bases, and that can be
trained automatically through relevance feedback.
Deepening Search: From the Surface to the Deep Web
Kevin C. Chang
University of Illinois at Urbana-Champaign, US
In the recent years, the Web has been rapidly deepened with the prevalence of
databases online. While the “surface Web” has linked billions of static HTML pages,
a far more significant amount of information is hidden in the “deep Web,” behind
the query forms of searchable databases. As the deep Web is largely invisible to
current search engines, users’ search requests do not reach this uncharted territory.
This proposal aims at opening up the deep Web, by extending users’ Web search, beyond
scratching the surface Web (as currently covered), into the deep Web. We aim at
providing a Deep Web Search System by directing users to online query forms as “dynamic
links” into the deep Web, with not only where these “doors” are, but also what might
be inside there. We will develop this facility in the context of the overall MetaQuerier
project.
Discovering
and Using Meta-Terms
Bruce Croft
University of Massachusetts at Amherst, US
Many queries, particularly “content-based” Web queries, contain terms that are
difficult to match directly with documents. We believe that many of these important
terms are in fact instances, examples, or more specific forms of query terms which
we call “meta-terms.” Transforming queries using replacements or expansions for
these terms can make a substantial difference to performance. In this research,
we will use both the Microsoft query logs and the TREC GOV2 collection to develop
techniques to discover meta-terms in queries and then mine related words from the
Web. The meta-term dictionary developed using these techniques will then be used
to carry out retrieval experiments and to test various approaches to query reformulation
or transformation. Evaluation will be done with the query log and click-through
data, and the TREC data will provide some solid baseline performance figures.
Incorporating Trust into Web Authority
Brian Davison
Lehigh University, US
The Web has become a battleground for control over search engine results. Search
providers continually work to improve the quality of their product, while marketers
strive for ever increasing visibility. Web link analysis is now well-targeted by
search engine marketers, and so “web spam” has become increasingly visible in Web
search. In this project, we incorporate a number of measures of trust and distrust
to improve estimates of Web page and site authority, reducing or eliminating the
effect of Web spam in the process.
Statistical Machine Learning for User Modelling
Zoubin Ghahramani
University of Cambridge, UK
Our aim is to model users, their relationships, and the information they seek,
using the query logs provided by Microsoft Research Live Labs. We will use advanced methods from
statistical machine learning, focusing particularly on fast approximate inference
algorithms so that we can make efficient use of the vast data sets provided. Some
of our specific aims include identifying trend-setters (users whose queries anticipate
those of others), multi-task collaborative learning (leveraging other users to help
personalized search), time series predictive modeling of click-through (predicting
the next query and clicked page), and identifying clusters of users, of queries,
and their network structure.
 Combining Econometric and Text Mining Approaches for Measuring
the Effect of Online Information Exchange
Panagiotis Ipeirotis, Anindya Ghose
New York University, US
You might have bought something on eBay and left a short feedback posting, summarizing
your interaction with the seller, such as “Lightning fast delivery! Sloppy packaging,
though.” Similarly, you might have visited Amazon and written a review for the latest
digital camera that you bought, such as “The picture quality is fantastic, but the
shutter speed lags badly.” The Internet has facilitated many such information exchanges
between buyers and sellers. For example, the exchange of news, personal viewpoints
and opinions, product reviews, and purchase decisions are all being strengthened
and extended in the context of the electronic markets. What is the economic value
of these comments? Increasingly these information exchanges are having some business
impact that is being reflected in one or more economic variables (for example, product sales,
pricing premiums, profits) that can be measured to examine the effect of a particular
information exchange. The comment about “lightning fast delivery” can enhance a
seller’s reputation and thus allow the seller to increase the price of the listed
items by a few cents, without losing any sales. On the other hand the feedback about
“sloppy packaging” can have the opposite effect on a seller’s pricing power. Similarly,
online reviews and conversations in blogs affect customers’ perception about the
quality of different products, which in turn can affect the total sales for that
product. Given the high volume of transactions that are completed on Internet based
electronic markets, this can lead to a substantial change in firms’ profitability.
This research studies the “economic value of text” in such online settings, focusing
on three important and varied categories of information exchanges: reputation systems
in electronic markets, product recommendations in online communities, and the impact
of social media (search engines, wikis, and blogs) on sales. This research program
combines established techniques from economics with text mining algorithms from
computer science to measure the economic value of each text snippet and understand
how textual content in these systems influence economic exchanges between various
agents in electronic markets.
The Truth is Out There: Aggregating Answers from Multiple Web
Sources
Amélie Marian
Rutgers University, US
The Internet has changed the way people look for information. Users now expect
the answers to their questions to be available through a simple Web search. Web
search engine are increasingly efficient at identifying the best sources for any
given keyword query and are often able to identify the answer within the sources.
Unfortunately, many Web sources are not trustworthy because of erroneous, misleading,
biased, or outdated information. In many cases, users are not satisfied with — or
do not trust — the results from any single source and prefer checking several
sources for corroborating evidence. The goal of this project is to provide an interface
that aggregates query results from different sources in order to save users the
hassle of individually checking query-related Web sites to corroborate answers.
In addition to listing the possible query answers from different Web sites, the
interface ranks the results based on the number, and importance, of the Web sources
reporting them. The existence of several sources providing the same information
is then viewed as corroborating evidence, increasing the quality of the corresponding
information.
Predictive Exploitation of Click-Through Knowledge
Alistair Moffat
University of Melbourne, Australia
Web retrieval systems will be more effective if they dynamically adapt to the
user’s information need according to how other users have responded to those same
documents when they were returned in response to the same or similar previous queries.
Access to the Microsoft query log data and click-through data will allow us to explore
this conjecture. We plan to use the data to construct synthetic “user sessions”
in which queries are combined with the matching click-throughs to establish a sequence
of operations for presumed single-topic searches. We will then retrieve the clicked
pages and judge them against our belief as to the nature of the underlying information
need. It will then be possible to investigate the extent to which subsequent issuers
of the same or similar queries could be given improved retrieval effectiveness,
assuming a range of possible user models as indicated by the click-through information
from earlier instances of that query. Finally, once we have built a model based
on the synthetic “sessions” extracted from the Microsoft logs, we will carry out
an experiment in which groups of users use (or not use) an enhanced system that
makes use of previous click-through information to bias ranking orderings. The search
behind this experiment will be based on access via the MSN Search Software Development
Kit.
Social Search: Bringing the Social Component to the Web
Gerd Stumme
University of Kassel, Germany
Social bookmark tools like del.icio.us are rapidly emerging on the Web. Unlike
link-based search approaches à la PageRank, these systems provide personal recommendations
based on input from similar users. This new paradigm will change the way we are
interacting with the web within the next few years. In particular, it will require
corresponding search functionality. Furthermore, these systems are more responsive
to upcoming topics, which can thus earlier be discovered and actively promoted.
Therefore, we will extend link-based search with social search, in order to provide
enhanced functionality and multiple search paradigms for the Web.
Mine Query/Click Log for Collaborative Internet Search
ChengXiang Zhai
University of Illinois at Urbana-Champaign, US
Search accuracy is closely related to how precise and discriminative a user’s
query is. Unfortunately, it is generally difficult for a user to know in advance
whether a particular query would be effective due to problems such as ambiguity
of many terms and possible mismatch between the terms used by document authors and
the user. As a result, a user often needs to iteratively refine a query many times
until eventually reaching a query that can return useful results — a process not
only time consuming, but also often requiring a great deal of knowledge about the
topic. However, for various kinds of reasons, different people may look for similar
information, and if some users already went through the process of refining queries
about a topic, we should be able to exploit their experiences to benefit other users
who are searching for similar information, which we refer to as “collaborative search.”
The goal of this project is to develop techniques to extract query refinement patterns
from the query/click log data collected by a search engine to support collaborative
search. The query/click log data, including users’ queries and viewed documents,
contains much valuable knowledge about query refinement accumulated from many users
and for all kinds of topics. We will apply statistical language models and text
data mining techniques to elicit such knowledge and exploit it to automatically
refine a user’s query or enable a user to refine a query more effectively. The techniques
to be developed would enable a Web search engine to improve its search performance
automatically over time as more and more user information is collected.
Microsoft Live Labs: Accelerating Search in
Academic Research 2006 RFP
|