*
Quick Links|Home|Worldwide
Microsoft*
Search for


External Research & Programs
External Research & Programs  

Internet Technologies and Cultures Call to Action 2007

Information Finding via Discovery, Recovery, and Delivery.


Has it become any easier to find a needle in a haystack in the information age?

The surface web consists of tens of billion pages in over 80 languages and is growing rapidly. Beneath this lays a deep web of much greater size. Complexity, size, and rate of expansion combine to make Web searching a significant challenge. To transform raw data into knowledge that is relevant to the information seeker, we need to go beyond string manipulation, and go, for example, towards the database of intentions, as proposed by John Battelle (2005).[1] Moreover, there is a growing demand to have information be accessible by anyone, from anywhere, at anytime.

The long-term goal of the Microsoft Research External Research & Programs Internet Technology and Cultures initiative is to advance research in desktop, mobile, and Web information discovery, recovery, and delivery through a better understanding of people’s search behaviors and needs, and ultimately building new and relevant knowledge for the user.

As part of building the initiative’s roadmap, we have identified three research themes for the next few years:

  • Enabling Research with Real World Data
  • Making the Web Meaningful and Social
  • Search on the Go

Enabling Research with Real World Data

Should we be developing better algorithms and training sets, or should we just be adding more data, or both? How much data is enough data to make decisions? Banko and Brill (2001)[2] showed, in a disambiguation task, that by using “more than a thousand times more data than had previously been used,” they “were able to significantly reduce the error rate, compared to the best system trained on the standard training size set.”

By providing large data sets, at Web scale, to the research community, we hope to enable the next generation of search algorithms, and we hope to identify bold, innovative approaches to information retrieval, data mining, machine learning, and human computer interactions — with the ultimate goal of creating new technologies that can drastically change the way we interact with the Web.

We have started this effort with a Request for Proposals (RFP) Accelerating Search in Academic Research 2006[3] for which we made available over 15 million real data query logs and click-throughs from MSN. The RFP, which generated over 180 proposals worldwide, was awarded to 12 Live Labs institutions[4] that are posing some of the most compelling questions in search technology today:

  • Even if the user gets relevant results, can he or she trust that information?
  • What’s happening on that part of the Web that’s not being crawled today, the so-called Deep Web?
  • How can user behavior help predict economic or social changes?
  • Is search an inference problem?
  • Is handling terabytes of data a trigger for (re-)enabling Artificial Intelligence in search by pushing research into new fields of computing?

Making the Web Meaningful and Social

This theme explores opportunities to put semantic and social computing ideas to work. The acquisition of semantic information has been the bottleneck to allow for a semantic analysis of information. With Internet culture evolving rapidly where people are not just consumers of information but are becoming “makers of information”[5], is it now the time to revisit this bottleneck and address the following:

  • How do people collaborate on the internet to produce data, metadata?
  • What are the processes and incentives for creating “good” data as opposed to “vandalizing” an acquisition effort?
  • What are the tradeoffs between “top down” approaches to meaning (such as reference ontology) or a bottom-up approach (such as folksonomies)?
  • Is there a value in a tagging platform?
  • Where is the value of the Semantic Web to improve the internet experience?

Search on the Go

GPS chips are becoming more available on cell phones, partly as a response to the 2005 government-mandated Enhanced 911 program to help emergency workers find people who dial 911 from their cell phones. One in three mobile handset phones will be a “smart phone” or Web-enabled device by 2009 in Western Europe. Cell phones are becoming India’s gateway to the Internet. In Japan, mobile devices are the winners of information exchange.

With a technology mix available and an expected user demand, it is time to focus on how to ‘search on the go’ and address questions such as:

  • What do people search when mobile? How do people navigate on their mobile device?
  • What is the average query length, the click through rate?
  • How to transform the surrounding environment into relevant knowledge at your fingertips?
  • Is there a benefit to alerts, when mobile (for instance, receiving messages relevant to me “your flight is 2 hours late” or “shoe sale at your favorite shop”)?
  • What are the security and privacy implications so that the user remains in control of her information?


[1] Batelle, J.  (2005) “The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture”, Penguin Books.

[2] Banko, M, E. Bri8ll (2001) Mitigating the Paucity-of-Data Problem: Exploring the Effect of Training Corpus Size on Classifier Performance for Natural Language Processing , HLT 2001.

[3] Microsoft Live Labs: Accelerating Search in Academic Research 2006 RFP

[4] For more information, visit the Microsoft Live Labs: Accelerating Search in Academic Research 2006 RFP Awards page and the Microsoft Live Labs Web site.

[5] See for instance “The Rise of Crowdsourcing” by Jeff Howe, June 2006.
 

^ back to top

 

 
Contact Us
  • Evelyne Viegas
    Microsoft Research
    evelynev at microsoft dot com

©2008 Microsoft Corporation. All rights reserved. Terms of Use |Trademarks |Privacy Statement