*
Quick Links|Home|Worldwide
Microsoft*
Search for


Data Exploration

 
Goal

This project pursues research on data exploration that identifies techniques for flexible ways to query, browse and aggregate data. One of our goals is to support approximate matches and ranked search in the database context. We also like to enable data browsing and querying services for XML that can interoperate between text, structured, and semi-structured (e.g., mail messages) data. We also investigate efficient approximate query processing techniques for answering ad-hoc aggregate queries (e.g. decision support or OLAP queries).

 
People

Sanjay Agrawal

Surajit Chaudhuri

Kaushik Chakrabarti

Venkatesh Ganti

Arnd Christian König

Vivek Narasayya

Dong Xin

 
Publications

The following papers are in pdf format. Click here to install Adobe Acrobat Reader.

Chaudhuri, S., Church, K., König, A.C. and Sui L., Heavy-Tailed Distributions and Multi-Keyword Queries . Proceedings of ACM SIGIR, Amsterdam, Netherlands, 2007. [pdf version]

Chakrabarti K., Ganti V., Han J., and Xin D., Ranking Objects Based on Relationships: Computing Top-K Over Aggregation. Proceedings of ACM SIGMOD, Chicago, 2006. pdf version

König, A.C. and Brill, E. , Reducing the Human Overhead in Text Categorization. Proceedings of the twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, USA, 2006. pdf version  

Chakrabarti K., Chaudhuri S. and Hwang S., Automatic Categorization of Query Results. Proceedings of ACM SIGMOD, Paris, France, 2004. pdf version

Chaudhuri S., Das G. and Srivastava U., Efficient use of Block-Level Sampling in Statistics Estimation. Proceedings of ACM SIGMOD, Paris, France, 2004. pdf version

Babcock B., Chaudhuri S. and Das G., Dynamic Sample Selection for Approximate Query Processing. Proceedings of ACM SIGMOD, San Diego, USA, 2003. pdf version

Agrawal S., Chaudhuri S., Das G. and Gionis A., Automated Ranking of Database Query Results. Proceedings of First Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, USA, 2003. pdf version

Agrawal S. Chaudhuri S. and Das G. , DBExplorer: A System for Keyword Search over Relational Databases. Proceedings of 18th International Conference on Data Engineering, San Jose, USA, 2002. pdf version  

Chaudhuri S., Das G., and Narasayya V., A Robust, Optimization-Based Approach for Approximate Answering of Aggregate Queries. Proceedings of ACM SIGMOD, Santa Barbara, USA. 2001. PDF version of Conference paper
Full paper (Microsoft Technical Report).  

Chaudhuri S., Das G., Datar M., Motwani R. and Narasayya V. , Overcoming Limitations of Sampling for Aggregation Queries. Proceedings of 17th International Conference on Data Engineering, Heidelberg, Germany 2001. pdf version  

Charikar M., Chaudhuri S., Motwani R., Narasayya V. Towards Estimation Error Guarantees for Distinct Values. 19th ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems, Dallas, USA. 2000.   Postscript version

Chaudhuri S., Gravano L. Evaluating Top-k Selection Queries. Proceedings of 25th VLDB Conference, Edinburgh, Scotland , UK. 1999. pdf version

Chaudhuri S., Motwani R. and Narasayya V., On Random Sampling over Joins. Proceedings of ACM SIGMOD, Philadelphia, USA. 1999. pdf version  

Chaudhuri S., Motwani R. and Narasayya V., Random Sampling for Histogram Construction: How much is enough? Proceedings of ACM SIGMOD, Seattle, USA. 1998. pdf version

If you have questions about this project, please contact Surajit Chaudhuri (surajitc@microsoft.com).


©2008 Microsoft Corporation. All rights reserved. Terms of Use |Trademarks |Privacy Statement