Data Exploration

This project pursues research on data exploration that identifies techniques for flexible ways to query, browse and aggregate data. One of our goals is to support approximate matches and ranked search in the database context. We also like to enable data browsing and querying services for XML that can interoperate between text, structured, and semi-structured (e.g., mail messages) data. We also investigate efficient approximate query processing techniques for answering ad-hoc aggregate queries (e.g. decision support or OLAP queries).

Goal

Keyword search over web and enterprise documents is a very popular mechanism for finding relevant information. In both enterprise and web scenarios, document collections coexist with large structured databases. Therefore, keyword search over structured databases, particularly in collections involving both structured and unstructured documents, is an important problem. In the data exploration project, we explore the algorithmic and systems issues arising out of the goal of searching and analyzing document collections and structured databases together. We want to enable two broad keyword search scenarios.

First, we want to identify structured database objects or entities relevant to a query, even if query keywords are not present in the entity name or description columns. Identifying entities in a database (e.g., products), for queries in which all query keywords do not match those in an entity name or description is an important and challenging problem. For example, we may want to return relevant digital cameras for a user query such as [fast action digital camera] which is searching for digital cameras suitable for taking good pictures involving fast moving objects. This functionality is very useful for improving vertical search engines as well as for enhancing web search (or in general document search) functionality. We are studying algorithmic and systems issues arising out of this goal. Many of the techniques we develop for achieving the above goal are also applicable to improve individual components in a web search engine such as query classification.

Second, we want to to enable efficient ranked keyword search on logical entities (obtained by joining multiple relations) in databases without materializing them. We are studying the algorithmic and systems issues arising out of this goal in the context of full text search in database systems and in the context of enterprise search engines. A related systems problem that we study is the efficient processing of keyword queries IR engines.

Once we have determined a relevant set of structured entities from one or multiple 'vertical' databases for a given search query, we then need to integrate these with 'regular' web search results. State-of-the-art web search engines typically show content from a variety of sources for many queries; given that the space available on the result page is limited, this results in the issue of selecting between different content types to be displayed.

In this context, we have studied the problems of (a) selecting an appropriate vertical (database) from which to display context, (b) prediction of the click-through rates for such content and (c) specialized index structures for matching advertisements to search queries.

If you have questions about this project, please contact the Data Exploration research team (dmx@microsoft.com).

Publications
Share
Share this page on Facebook
Share this page on Twitter
Share this page on LinkedIn
E-mail this page
RSS feeds