|
Has it become any easier to find a needle in a haystack in the
information age?
The surface web consists of tens of billion pages in over 80 languages and is
growing rapidly. Beneath this lays a deep web of much greater size. Complexity,
size, and rate of expansion combine to make Web searching a significant
challenge. To transform raw data into knowledge that is relevant to the
information seeker, we need to go beyond string manipulation, and go, for
example, towards the database of intentions, as proposed by John Battelle
(2005).[1]
Moreover, there is a growing demand to have information be accessible by anyone,
from anywhere, at anytime.
The long-term goal of the Microsoft Research External Research & Programs
Internet Technology and Cultures initiative is to advance research in desktop,
mobile, and Web information discovery, recovery, and delivery through a better
understanding of people’s search behaviors and needs, and ultimately building
new and relevant knowledge for the user.
As part of building the initiative’s roadmap, we have identified three
research themes for the next few years:
- Enabling Research with Real World Data
- Making the Web Meaningful and Social
- Search on the Go
Enabling Research with Real World Data
Should we be developing better algorithms and training sets, or should we
just be adding more data, or both? How much data is enough data to make
decisions? Banko and Brill (2001)[2] showed, in a disambiguation task, that by using
“more than a thousand times more data than had previously been used,” they “were
able to significantly reduce the error rate, compared to the best system trained
on the standard training size set.”
By providing large data sets, at Web scale, to the research community, we
hope to enable the next generation of search algorithms, and we hope to identify
bold, innovative approaches to information retrieval, data mining, machine
learning, and human computer interactions — with the ultimate goal of creating
new technologies that can drastically change the way we interact with the Web.
We have started this effort with a Request for Proposals (RFP) Accelerating
Search in Academic Research 2006[3]
for which we made available over 15 million real data query logs and click-throughs from MSN. The RFP, which generated over 180 proposals worldwide, was
awarded to 12 Live Labs institutions[4]
that are posing some of the most compelling questions in search technology
today:
- Even if the user gets relevant results, can he
or she trust that information?
- What’s happening on that part of the Web
that’s not being crawled today, the so-called Deep Web?
- How can user behavior help predict economic or
social changes?
- Is search an inference problem?
- Is handling terabytes of data a trigger for
(re-)enabling Artificial Intelligence in search by pushing research into
new fields of computing?
Making the Web Meaningful and Social
This theme explores opportunities to put semantic and social computing ideas to
work. The acquisition of semantic information has been the bottleneck to allow
for a semantic analysis of information. With Internet culture evolving rapidly
where people are not just consumers of information but are becoming “makers of
information”[5], is it now the time to revisit
this bottleneck and address the following:
- How do people collaborate on the internet to
produce data, metadata?
- What are the processes and incentives for
creating “good” data as opposed to “vandalizing” an acquisition effort?
- What are the tradeoffs between “top down”
approaches to meaning (such as reference ontology) or a bottom-up
approach (such as folksonomies)?
- Is there a value in a tagging platform?
- Where is the value of the Semantic Web to
improve the internet experience?
Search on the Go
GPS chips are becoming more available on cell phones, partly as a response to
the 2005 government-mandated Enhanced 911 program to help emergency workers find
people who dial 911 from their cell phones. One in three mobile handset phones
will be a “smart phone” or Web-enabled device by 2009 in Western Europe. Cell
phones are becoming India’s gateway to the Internet. In Japan, mobile devices
are the winners of information exchange.
With a technology mix available and an expected user demand, it is time to
focus on how to ‘search on the go’ and address questions such as:
- What do people search when mobile? How do
people navigate on their mobile device?
- What is the average query length, the click
through rate?
- How to transform the surrounding environment
into relevant knowledge at your fingertips?
- Is there a benefit to alerts, when mobile (for
instance, receiving messages relevant to me “your flight is 2 hours
late” or “shoe sale at your favorite shop”)?
- What are the security and privacy implications
so that the user remains in control of her information?
back to top
|