Research Interests
Keyword Search: By relating large document collections with large repositories of structured data, it is possible to define new keyword search and analysis functionality. In the data exploration project, I am investigating the research issues arising out of this goal.
Data Cleaning: Decision support analysis on data warehouses influences important business decisions; therefore, accuracy of such analysis is crucial. However, data received at the data warehouse from external sources usually contains errors: spelling mistakes, inconsistent conventions, etc. Hence, significant amount of time and money are spent on data cleaning, the task of detecting and correcting errors in data. The goal of our Data Debugger project is to develop a platform of efficient and accurate primitive operators along with design tools, which enables programmers to develop accurate and scalable data cleaning solutions for a variety of domains.
- Venkatesh Ganti, Arnd Christian König, and Xiao Li, Precomputing Search Features for Fast and Accurate Query Classification, in WSDM 2010, Association for Computing Machinery, Inc., 4 February 2010
- Surajit Chaudhuri, Venkatesh Ganti, and Dong Xin, Exploiting Web Search To Generate Synonyms For Entities, in 18th International World Wide Web Conference, Association for Computing Machinery, Inc., April 2009
- Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti, Arnd Christian König, and Dong Xin, Exploiting Web Search Engines to Search Structured Information , in 18th International World Wide Web Conference, IEEE, April 2009
- Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti, Arnd Christian König, and Dong Xin, Query Portals: Dynamically Generating Portals for Web, in 18th International World Wide Web Conference, IEEE, April 2009
- surajit chaudhuri, venky ganti, and dong xin, Mining Document Collections to Facilitate Accurate Approximate Entity Matching, in VLDB, Very Large Data Bases Endowment Inc., 2009
- Venkatesh Ganti, Arnd Christian König, and Rares Vernica, Entity Categorization over Large Document Collections , in 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, Inc., 24 August 2008
- Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, and Venkatesh Ganti, Scalable Adhoc Entity Extraction from Text Collections , in VLDB Conference, 2008
- Surajit Chaudhuri, Anish Das Sarma, Venkatesh Ganti, and Raghav Kaushik., Leveraging Aggregate Constraints for Deduplication, in SIGMOD, Association for Computing Machinery, Inc., 2007
- Surajit Chaudhuri, Bee Chung Chen, Venkatesh Ganti, and Raghav Kaushik, Example Driven Design of Efficient Record Matching Queries, in VLDB, Very Large Data Bases Endowment Inc., 2007
- Arvind Arasu, Venkatesh Ganti, and Raghav Kaushik, Efficient Exact Set-Similarity Joins, in Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB 2006, Very Large Data Bases Endowment Inc., August 2006
- Surajit Chaudhuri, Venkatesh Ganti, and Raghav Kaushik, A Primitive Operator for Similarity Joins in Data Cleaning, in ICDE, Institute of Electrical and Electronics Engineers, Inc., 2006
- Surajit Chaudhuri, Venkatesh Ganti, and Luis Gravano, Selectivity estimation for string predicates: Overcoming the underestimation problem, in ICDE, Institute of Electrical and Electronics Engineers, Inc., 2004
- Eugene Agichtein and Venkatesh Ganti, Mining reference tables for automatic text segmentation, in SIGKDD, Association for Computing Machinery, Inc., 2004
- Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, and Rajeev Motwani, Robust and efficient fuzzy match for online data cleaning, Association for Computing Machinery, Inc., 2003
- Rohit Ananthakrishna, Surajit Chaudhuri, and Venkatesh Ganti, Eliminating Fuzzy Duplicates in Data Warehouses, in VLDB, 2002



