Data Management, Exploration and Mining (DMX)

Overview

The Data Platforms and Analytics pillar currently consists of the Data Management, Mining and Exploration Group (DMX) group, which focuses on solving key problems in information management. Our current areas of focus are infrastructure for large-scale cloud database systems, reducing the total cost of ownership of information management, enabling flexible ways to query, browse and organize rich data sets containing both structured and unstructured data, and the management of database schemas and mappings.

Our research focuses on research projects that produce practical software. Our software has shipped in many Microsoft products and services, including the Database Tuning Advisor (in SQL Server), the Fuzzy Lookup and Fuzzy Grouping operators (Microsoft SQL Server Integration Services (SSIS), and used in Bing Maps and Bing Shopping), the mapping compiler for Microsoft’s ADO.NET Entity Framework, the schema-matching algorithm in Microsoft’s BizTalk Mapper, click-through prediction in Bing search, and the advertisement indexing engine in search advertising, among others.

Our research has also had significant impact in the academic community. We publish in the top conferences in the areas of systems, information retrieval, and database management (SIGMOD, VLDB, SIGKDD, SIGIR, WWW, ICDE, CIDR, etc.). Our work has spawned two VLDB 10-Year Best Paper Awards, and Best Paper awards at SIGMOD, VLDB, ICDE and CIDR, and a ICDE Influential paper Award.

 

News:

Sudipto Das has won the 2013 SIGMOD Jim Gray Doctoral Dissertation Award, honoring the best PhD thesis in database systems of the past year, for his thesis titled "Scalable and Elastic Transactional Data Stores for Cloud Computing Platforms," at U.C. Santa Barbara.

Herodotos Herodotou received an Honorable Mention for the ACM SIGMOD 2013 Dissertation Award, in recognition of his dissertation titled "Automatic Tuning of Data-Intensive Analytical Workloads".

 

Projects

The DMX group is currently active in the following projects:

  • Autoadmin: Tools to reduce the high total cost of ownership of database systems by making them self-tuning and self-administering.
  • Data Cleaning: Techniques for data cleaning tasks such as record matching, de-duplication, and column segmentation on large data sets.
  • Data Exploration: Techniques for flexible ways to query, browse and rank data, bridging the divide between structured (i.e., record-based) and unstructured (e.g., free text) data.
  • Data Mining: Statistical and machine learning techniques over enterprise databases.
  • Hyder: A transactional indexed-record manager for shared flash.
  • Model Management: Tools for managing database schemas and mappings.
  • SQLVM: Performance Isolation in multi-tenant Relational Database-as-a-Service.