*
Quick Links|Home|Worldwide
Microsoft*
Search for


External Research & Programs

Beyond Search – Semantic Computing and Internet Economics
Request for Proposals 2007

This RFP has been awarded. To view the award recipients, see Beyond Search RFP Awards.

DEADLINE EXTENSION for last date for submission of proposals:   

November 5, 2007, 5 PM PST.

Background

Has it become any easier to find a needle in a haystack or just buy a new needle in the information age?

The surface Web consists of tens of billions of pages in over 80 languages and is growing rapidly. Beneath this surface Web, lies a deep Web of much greater size. Tens of thousands of merchants offer millions of products. Complexity, size, and rate of expansion combine to make finding information and products on the Web a significant challenge. The ‘user,’ now participates as an ‘innovator’ and ‘contributor,’ adding to the size and content, but not necessarily the structure, of the Web.

Today, we are still in the early stages of the digital information age, with few opportunities to greatly improve the ways in which the information seeker finds information.

Continuing to enhance our ability to find or to supply information on the Internet requires breakthroughs in fundamental research. Some of the key technologies needed to find, discover, extract, publish, share, or supply information, while making sure that the online world does not turn into a primary place for information leaks, include:

  • Semantic Computing (making the web meaningful)
  • Internet Economics (understanding the commercial needs of the users)

To help make advances in those areas, Microsoft is making real-world search and ad data available to academia, addressing the academic need for more large-scale data and encouraging innovation in academic research. Microsoft believes that by increasing the availability of relevant, large, and current data sets from Windows Live, new analysis of data and new algorithm developments in Semantic Computing and Internet Economics will be supported.

Goals & Objectives

Microsoft would like to enable cross-disciplinary research in the area of Semantic Computing, Internet Economics and Online Advertising and is looking for innovative proposals which explore the technical, research, societal, and commercial issues around the topics of “Semantic Computing” and “Internet Economics” as described under Track 1 and Track 2 below.

Track 1: Semantic Computing

To transform raw data into information that is relevant to the information seeker, we need to go beyond string manipulation, and towards Batelle’s “database of intentions.” With the advent of large scale text corpora, the cost of developing and maintaining ontologies and a rule-based system was either too high or just inadequate for the type of accuracy, scalability, and adaptability needed for a pervasive task such as Internet search.

However, as we move from searching to actually “finding information” we need to consider approaches which add semantic value; but here we hit the acquisition bottleneck. Today, acquiring and maintaining (semi)structured information is still an expensive proposition, mainly left in the hands of experts. Semantics should not be the sole property of a few domain experts: in fact, each individual has some level of expertise. For semantic approaches to be successful at Internet scale, we need to focus on the following questions:

  • Semantic structure – How to account for the contextual elasticity and temporal fluidity of meanings (languages, people, organizations do change)
  • Acquisition model – How to leverage the expertise of every Internet surfer, of every information consumer and producer (“wisdom of the crowds,” “human computation”)
  • Platforms and tools –What are the platforms and channels which can use semantic information, at web scale, to improve ‘findability’ of information and intent detection on the Internet?
  • Collaboration model – How can people create and/or collaborate on the Internet to produce data, metadata, thesauri, ontologies, or folksonomies to improve the Internet experience?
  • Intent detection – How can semantics and context help us discover the user’s intent?
  • Information access – How can semantics inform a model of information access which preserves data confidentiality and user privacy?

Track 2: Internet Economics

The Web plays a major role in meeting user’s commercial needs. In 2005 70% of US adults used the Internet as an information source when shopping locally for products and services and more than 34% made an online purchase in the last year. By 2010 e-commerce transactions are expected to reach over $300 billion in revenue and have a 14% compounded annual growth rate over six years. The total spent by Internet advertisers in 2005 is estimated at $8.3 billion, a growth of 13.3% from 2004.

Advertising has become an integral part of the Web browsing experience. All major search engines use advertising to help address the commercial needs of their users. Relevant and targeted advertisements can be used to research products, compare prices and merchants, and finally to make an online purchase. However, satisfying each user requires a sophisticated blend of user intent analysis, and a selection of relevant and economically viable, high quality and informative advertisements.

Online advertising combines a number of different research areas and can be seen as an interdisciplinary area which builds on the expertise of different domains including: web search, data mining, machine learning, auction theory, user modeling, and many others.

For advertising to better meet the commercial needs of users, we need to focus on improvements in all of these areas:

  • Commercial Intent Detection – Are users looking to buy a product or just searching for an answer?
  • Query Intent Inference – Is the user exploring or trying to find a specific product?
  • Ad Relevance – Which ads are relevant to a specific query issued by a specific user under a set of known circumstances?
  • Measures and Metrics – What utility function should be used to maximize user satisfaction? What about advertiser satisfaction?
  • Ad Ranking – Should ads be ranked by relevance, click-through rate or a combination of both?
  • User Targeting – What user information can be used to improve the user experience without compromising user privacy?
  • Adversarial Interactions – Can we detect all misleading advertisements before they are viewed by the user?
  • Forecasting and Trend Detection – Can we predict new product trends based on user search patterns?
  • Auction Theory – Should advertisers pay for user clicks or for transactions?

Answering these and many other questions will help the Internet economics industry to grow into a user oriented tool instead of a distraction.

To advance academic research and publication in the area of Internet Research and, in particular in data mining, information finding, information supply, and Internet economics, Microsoft intends to make available to the RFP awardees a Microsoft adCenter Search query log excerpt with 100 million search queries along with ad click logs sampled over a few months, and a Live Search query log excerpt with 15 million search queries with per-query search result clickthroughs “(execution of a limited license for use of these assets will be required). In addition, Microsoft adCenter will provide advertiser accounts to all winners.

adCenter assets will include:

  • Search queries
  • Ad impressions
  • Ad clicks

Live Search assets will include:

  • Search queries
  • Per-query search result click-through

In addition, Microsoft has already made available a Live Search Software Development Kit (SDK) that enables programmatic access to Live Search results for up to 25,000 queries per day with 50 search results per query. The SDK is available at http://dev.live.com/livesearch/

RFP awardees will be allowed up to 100,000 queries per day.

Microsoft encourages proposals for either track which describe innovative research using the adCenter, Live Search, and SDK assets above in bold, novel, and unconventional approaches to further Internet Research and related technologies, including interdisciplinary research. However, researchers are not required to make use of the assets described above and are encouraged to “think outside the box” in their proposals.

Awards

The total amount available under this request for proposals (RFP) is $1,000,000. Microsoft Research anticipates making approximately 20 awards averaging $50,000, with a maximum of $100,000 for any single award. All awards will be made in $US. Awards are generally made as unrestricted gifts to the institution. Outside the United States other local restrictions may apply to the terms of the award. For current policy regarding non-U.S. countries, please refer to http://research.microsoft.com/ur/us/fundingopps/faq.aspx

For all awards, payment of indirect costs (“overhead”) is not permitted.

Microsoft Research will take into account the reasonableness of the amount requested in any proposal in light of stated deliverables, local costs, etc., and reserves the right to fund proposals at an amount lower than requested if appropriate.

Awards are made for the purpose of seed-funding larger initiatives, proofs of concept, or demonstrations of feasibility. It is important to understand that funding will continue after the first year only in exceptional circumstances, and that the principal investigators should therefore make every effort to leverage Microsoft Research funds as one component of a diverse funding base in a larger or longer-running project.

Eligibility

Conditions of eligibility listed below will be strictly adhered to, so please read them carefully. Proposals not meeting all these criteria will not be considered.

  1. The proposing institution must be either:
    1. An accredited degree-granting college or university (or international equivalent) with non-profit status and awarding degrees at the baccalaureate level or above.
    2. A research institution with non-profit status.
  2. All qualifying institutions are eligible without regard for geographic location.
  3. An institution will be awarded a maximum of one gift per RFP, regardless of the number of proposals submitted from the institution. In the case of an RFP with multiple tracks, a maximum of one gift per RFP track will be awarded to the same institution. Collaborative proposals embracing multiple groups across the organization are encouraged.
  4. Proposals that are incomplete, inaccurate, request funds in excess of the maximum award available, or are otherwise not responsive to the stated aims, terms and conditions of this RFP will, at the sole discretion of Microsoft Research, be excluded from consideration.
  5. Proposals from or on behalf of persons participating in the evaluation process for this RFP will not be considered.
  6. Proposals should evidence a commitment to make all results arising from a funded project (including all intellectual property in those results) broadly available by either: (i) dedicating such results to the public domain (for example though publishing); or (ii) making the results available under a non-restrictive license that allows modification and redistribution without any significant restrictions or conditions, including so-called “reach through” provisions that require publication of source code. An example of an acceptable license is the BSD license available at http://www.opensource.org/licenses/bsd-license.html, whereas the widely-used GPL and LGPL licenses are not acceptable.
  7. Proposals should evidence willingness to contribute any resulting curriculum material to the MSDN Academic Alliance Repository at http://www.msdnaacr.net/curriculum/facetmain.aspx.
  8. The receiving institution must agree that awards made as unrestricted gifts, will not be subject to indirect costs or overhead charges and these may not be included in the budget for the proposed project.
  9. While the use of Microsoft technologies is not a condition of this RFP, any proposal relying exclusively on non-Microsoft technologies should provide a justification for why this must be the case. Please note that ordinary use of Microsoft Office applications will not be compelling in itself, although innovative uses of Office applications (or the use of applications such as SQLServer, Visual Studio, C#, .NET and Windows Presentation Foundation) are acceptable and encouraged. We are not able to provide support for the development of exclusively Java or LINUX-based applications. Use of non-Microsoft applications on Windows, cross-platform development, and interoperability with other operating systems and applications are all encouraged.
  10. Awardees will be expected to participate in a workshop to share with the other awardees and Microsoft the result of their research provided under this award. Microsoft will be covering the travel expenses for the workshop.
  11. Principal Investigators interested in accessing the query log excerpt will need to sign a Licensing Agreement. Terms of the license will allow for publication of results but may restrict redistribution of the data and publication of detailed excerpts of the data.

Submission Process

Proposals will be accepted in electronic form only at http://microsoft.redwhale.com. Proposals submitted to Microsoft Research will not be returned. Microsoft Research cannot assume responsibility for the confidentiality of information in submitted proposals. Therefore, proposals should not contain information that is confidential, restricted or sensitive. Microsoft Research reserves the right to make public proposals that receive awards, except those portions containing budgetary or personally identifiable information.

The submission process includes two parts.

  1. Brief summary and contact information. Applicants must provide full contact information for principal investigators, amount requested, a brief abstract, and the track selected for submission (Track1 or Track2). This information should be entered into the web forms during the proposal submission process.
  2. Complete proposal containing full detail on the proposed project. 7 pages maximum, 10pt. font or larger, double-spaced, in either Microsoft Word or PDF format.

Proposals should address each of the items listed below under separate numbered headings:

  1. Problem Statement: What is the problem or curriculum area addressed by the proposal and why is it important? What is the potential contribution to the field of the project if successful? Cite relevant work in the field as appropriate.
  2. Expected Outcomes: What tangible assets, if any, will be created or produced as a result of the proposed project? How will the results of this project be disseminated to others?
  3. Schedule: When is the project to be completed? What milestones will be used to measure progress of the project and when will they be completed? (If the project described is part of a larger ongoing research program, estimate the time for completion of this project only).
  4. Use of Funds: Provide a budget ($US) describing how the award will be used, including purchases of hardware or software, salaries, and other costs. Microsoft software or licenses requested should be listed in the budget, but the cost should be given as zero dollars. The budget does not have to be detailed, and should be presented as a table with the total budget request clearly indicated. Please note that, because awards are made as unrestricted gifts, Microsoft policy prohibits the payment of indirect cost (“overhead”).
  5. Use of Microsoft Technologies: Describe the Microsoft tools and technologies (if any) to be used in this project. If software is to be developed, give details of the tools to be used, the number of software developers and the proposed timescale. Does the software to be developed require the incorporation of code from commercial or public-domain libraries? If so, please give details.
  6. Related Research: Give a brief summary of the current state of the art in this field, including references where appropriate.
  7. Dissemination and Evaluation: How will the results of this project be evaluated (if appropriate), and how will they be disseminated to others? Under what general license terms will the results be made available?
  8. Other Support: Including other contributions to this project (cash, goods or services), if any, but not including such things as use of university facilities otherwise provided on an ongoing basis. Please note: authors of winning proposals will be required to submit an original letter on department letterhead certifying the commitment of any additional or matching support described in the proposal.
  9. Qualifications of Principal Investigator: Include a brief description of any relevant prior research, teaching, publication or other professional experience. A detailed vita or list of publications is not required.

Please do not submit any confidential materials to Microsoft.

Selection Process and Criteria

All proposals in compliance with the eligibility criteria and received by the deadline will be peer-reviewed by a panel of subject-matter experts chosen by Microsoft Research and adCenter, and which may include experts from outside the company. Based on evaluations by the review panel, Microsoft will select the most worthy proposals for funding. Microsoft reserves the right to fund winning proposals at an amount greater or less than the amount requested, up to the stated maximum amount for individual awards. Due to the volume of submissions and for legal reasons, Microsoft Research cannot provide individual feedback on proposals that are not funded.

Authors (including co-investigators) of winning proposals will be required to grant Microsoft permission for the use of their name, image, institutional affiliation and related professional information in press releases or other forums for publication of their award. Microsoft Research may also request assistance with the preparation of posters, slides or other materials, and periodic reports on the status of funded projects.

All proposals will be evaluated based upon the following criteria:

  1. Novelty and Aspiration with Well-defined goals and objectives that, if achieved, have the potential to have a significant impact on semantic computing field and the online advertising industry. These goals must be achievable within the timescale of the funded project, and where appropriate placed into the context of milestones in a larger or longer-running project.
  2. Potential for wide dissemination and use of intellectual property created, including specific plans for publications, conference presentations, distance learning, etc., as well as plans to distribute content in multiple formats or languages.
  3. Ability to complete the project including the adequacy of resources available, reasonableness of timelines, and number and qualifications of identified contributors.
  4. Qualifications of principal investigator including previous history of work in the area, successful completion of previous funded projects, teaching awards, books published, etc.
  5. Use of Microsoft tools and technologies: proposals should clearly indicate the Microsoft tools and technologies to be used in the project, or if no such technologies can be used, a clear statement should be made why this is the case.
  6. Leveraging of other resources: preferential consideration will be given to proposals utilizing the data asset (query, ads) provided as part of this proposal and Microsoft technologies (Live SDKs described in Additional Resources below) to the extent applicable. Also considered are additional sources of funding to build larger or longer-running projects and/or leveraging other projects or resources in the field.

Schedule and Deadlines

Announcement

September 5, 2007

First date for submission of proposals:

September 7, 2007

DEADLINE EXTENSION for last date for submission of proposals:   

November 5, 2007, 5 PM PST.
Note: PST= -8 UTC/GMT.

Notification of Awards:

December 5, 2007

Please be advised that we are obligated to strictly adhere to the deadline date and time. The application system will not accept submissions after the deadline has expired. Exceptions to this policy cannot be granted. It is advisable to upload your submission well in advance of the deadline.

Asset Description

No Personally Identifiable Information will be released as part of the data assets.

Microsoft adCenter 2007 search query log excerpt:

  • 100 million queries
  • Sampled over a few months
  • Queries from the US site (mostly in English)

Per query attributes include:

  • Session ID
  • Time-stamp
  • Query string
  • Ad impressions
  • Ad clickthroughs

Data per query for each result clicked:

  • Domain
  • Associated query
  • Position on results page
  • Time-stamp

Live Search 2006 search query log excerpt:

  • 15 million queries
  • Sampled over one month
  • Queries from the US site (mostly English)

Per query attributes include:

  • Session ID
  • Time-stamp
  • Query string
  • Number of results on results page
  • Results page number

Data per query for each result clicked:

  • URL
  • Associated query
  • Position on results page
  • Time-stamp

Live Search SDK:

  • Up to 100,000 queries per day
  • 50 results per query

Additional Resources

Beyond Search FAQ
For additional information, please review the Beyond Search FAQ.

Microsoft Developer Center
The Microsoft Developer Center (http://msdn.microsoft.com/) features tools and interfaces to enable developers to plug into Internet services and applications. Some of the Software Development Kits (SDKs) are described below.

Live Search SDK
The Live Search SDK(http://dev.live.com/livesearch/) provides an interface Extensible Markup Language (XML) Web Service through a SOAP API. The Live Search Web Service enables you to submit queries to and return results from the Live Search Engine.

Keyword Services Platform
The Keyword Services Platform(KSP) (http://ksp.microsoft.com/) is a platform that contains a set of providers (algorithms). These algorithms mine various data sources (search and paid search log, passport, advertising database, etc.) and provide keyword intelligences through a set of Web services APIs. The keyword services include term categorization, term suggestion, keyword extraction, term forecasting, term monetization, etc. Developers can use these services to build the next generation of advertising applications and beyond. Microsoft adCenter will provide to all winners advertiser accounts to use KSP.

Microsoft adCenter Account
Microsoft adCenter (http://adCenter.microsoft.com/) will provide to all winners access to adCenter advertiser accounts.

Interactive Virtual Earth SDK
The interactive Virtual Earth SDK, found at http://dev.live.com/virtualearth/sdk/, gives you a new way of learning how to use the Microsoft Virtual Earth map control APIs. Rather than hunting for the correct object and method names, you can find the code you need by choosing the tasks you want to accomplish. The Interactive SDK shows you how the map control works and provides the complete code you need to implement the task on your own page.

MapPoint Web Service
The MapPoint Web Service, found at http://www.microsoft.com/mappoint/default.mspx, is a programmable web service which can be used by software developers to integrate location-based services such as maps, driving directions, and proximity searches into software applications and business processes, providing better discoverability of store locations and business assets.

Please address any questions to “msrfp07[at]microsoft[dot]com”. Please put “Beyond Search – Semantic Computing and Internet Economicsin the subject line of your e-mail message to ensure a prompt and proper response.

 

Microsoft Word version of this document

 


©2008 Microsoft Corporation. All rights reserved. Terms of Use |Trademarks |Privacy Statement