|
Data Mining: Efficient Data Exploration and Modeling
Goal
The Knowledge Discovery and Data Mining (KDD) process consists of data selection, data cleaning, data transformation and reduction, mining, interpretation and evaluation, and finally incorporation of the mined "knowledge" with the larger decision making process. The goals of this research project include development of efficient computational approaches to data modeling (finding patterns), data cleaning, and data reduction of high-dimensional large databases. Methods from databases, statistics, algorithmic complexity, and optimization are used to build efficient scalable systems that are seamlessly integrated with the Relational/OLAP database structure. This enables database developers to easily access and successfully apply data mining technology in their applications. Current Status
This is a long-term project. In the short term, the focus will be on automating the data mining process over data warehouses. This includes work in the following areas:
People
Publications
The following papers are in pdf format. Click here to install Adobe Acrobat Reader. Chaudhuri S., Narasayya V. and Sarawagi S. Efficient Evaluation of Queries with Mining Predicates. Proceedings of 18th International Conference on Data Engineering, San Jose, USA, 2002. pdf version Netz A., Bernhardt J., Chaudhuri S., and Fayyad U. Integrating Data Mining with SQL Databases: OLE DB for Data Mining. Proceedings of 17th International Conference on Data Engineering, Heidelberg, Germany, 2001. pdf version Fayyad U. M., Chaudhuri S.,Bradley P. S. Data Mining and its Role in Database Systems. Tutorial, Proceedings of the 26th International Conference on Very Large Databases, Cairo, Egypt, 2000. Netz A., Chaudhuri S., Bernhardt J., Fayyad U. Integration of Data Mining and Relational Databases , Proceedings of the 26th International Conference on Very Large Databases, Cairo, Egypt, 2000. pdf versionBernhardt J., Chaudhuri S. and Fayyad U. , Scalable Classification over SQL Databases. Proceedings of 15th International Conference on Data Engineering, Sydney, Australia, 1999. pdf version Data Mining and Database Systems: Where is the Intersection? . Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, March 1998. pdf versionGraefe G., Fayyad U. M., and Chaudhuri S., On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases.. Proceedings of the Fourth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , New York, USA 1998. pdf version
If you have questions about this project, please contact Surajit Chaudhuri (surajitc@microsoft.com).Read more about how data mining is integrated into SQL server. |