Venkatesh Ganti
|

Data Management,
Exploration and Mining Group
Microsoft Research
One Microsoft Way
Redmond, WA 98052
Fax: (425) 936-7329
Email: vganti AT microsoft.com
|
Research Interests |
Keyword Search: By relating
large document collections with large repositories of structured data, it is
possible to define new keyword search and analysis functionality. In the data
exploration project, I am investigating the research issues arising out of this
goal.
Data Cleaning: Decision support analysis on data warehouses influences important business decisions; therefore, accuracy of such analysis is crucial. However, data received at the data warehouse from external sources usually contains errors: spelling mistakes, inconsistent conventions, etc. Hence, significant amount of time and money are spent on data cleaning, the task of detecting and correcting errors in data. The goal of our Data Debugger project is to develop a platform of efficient and accurate primitive operators along with design tools, which enables programmers to develop accurate and scalable data cleaning solutions for a variety of domains.
|
Some Recent Publications |
Surajit Chaudhuri, Bee-Chung Chen, Venkatesh Ganti, Raghav Kaushik. Example-driven design of efficient record matching queries. VLDB 2007.
Surajit Chaudhuri, Anish Das-Sarma, Venkatesh Ganti, Raghav Kaushik. Leveraging aggregate constraints for deduplication. SIGMOD 2007.
Arvind Arasu, Venkatesh Ganti, Raghav Kaushik. Efficient exact set-similarity joins. VLDB 2006.
Surajit Chaudhuri, Venkatesh Ganti, Raghav Kaushik. A primitive operator for similarity joins in data cleaning. ICDE 2006.
Surajit Chaudhuri, Venkatesh Ganti, Rajeev Motwani. Robust identification of fuzzy duplicates. ICDE 2005.
Eugene Agichtein, Venkatesh Ganti. Mining reference tables for automatic text segmentation. SIGKDD 2004.
Surajit Chaudhuri, Venkatesh Ganti, Luis Gravano. Selectivity estimation for string predicates: Overcoming the underestimation problem. ICDE 2004.
Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, Rajeev Motwani. Robust and efficient fuzzy match for online data cleaning. SIGMOD 2003.
Rohit Ananthakrishna, Surajit Chaudhuri, Venkatesh Ganti: Eliminating Fuzzy Duplicates in Data Warehouses. VLDB 2002
Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan, Wei-Yin Loh: A Framework for Measuring Differences in Data Characteristics. JCSS 64(3): 542-578 (2002)
Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan: DEMON: Mining and Monitoring Evolving Data. TKDE 13(1): 50-63 (2001)
Venkatesh Ganti, Mong Li Lee and Raghu Ramakrishnan, ICICLES: Self-tuning samples for approximate query answering. Proceedings of the 26th International Conference on Very Large Databases (VLDB00), Cairo, Egypt, 2000. pdf version
Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan: Mining Data Streams under Block Evolution. SIGKDD Explorations 3(2): 1-10 (2002)
Surajit Chaudhuri, Umeshwar Dayal, Venkatesh Ganti: Database Technology for Decision Support Systems. IEEE Computer 34(12): 48-55 (2001)
Thanks to Michael Ley, I can point you to a complete list of my papers.
|
Book Chapters |
Surajit Chaudhuri, Umeshwar Dayal, Venkatesh Ganti: Data Management Technology for Decision Support Systems. Advances in Computers. Vol 62. 2004.
Venkatesh Ganti and Raghu
Ramakrishnan, Mining and Monitoring Evolving Data.
Handbook of Massive Datasets, Kluwer Academic Publishers
|
Professional Activities |
Co-chaired (with Andrew Tomkins) 2007 ACM SIGKDD Industrial Track
Co-chaired (with Felix Naumann) 2007 QDB workshop (held in conjunction with VLDB 2007)
Co-chaired (with Minos Garofalakis) 2002 DMKD workshop (held in conjunction with SIGMOD 2002)
Associate editor for IEEE TKDE journal
Member of the Program Committees for Sigmod 2008, VLDB 2007, ICDE 2005, KDD 2004, CIKM 2004, DMKD 2004, ICDM 2004, ICML 2003, PAKDD 2003, DMKD 2003, ICDE 2002, DMKD 2002, DMKD 2001.
Organized (with Minos Garofalakis) the DMKD 2002 workshop.
|
Education |