Querying Uncertain Data with Aggregate Constraints

  • Mohan Yang ,
  • Haixun Wang ,
  • Haiquan Chen ,
  • Wei-Shinn Ku

ACM International Conference on Management of Data (SIGMOD) |

Data uncertainty arises in many situations. A common approach to query processing over uncertain data is to sample many “possible worlds” from the uncertain data, and run queries against the possible worlds. However, sampling is not a trivial task, as a randomly sampled possible world may not satisfy known constraints imposed on the data. In this paper, we focus on an important category of constraints, the aggregate constraints. An aggregate constraint is placed on a set of records instead of on a single record, and a real-life system usually has a large number of aggregate constraints. It is a challenging task to find qualified possible worlds in this scenario, as tuple by tuple sampling is extremely inefficient as it rarely leads to a qualified possible world. In this paper, we introduce two approaches for querying uncertain data with aggregate constraints, namely constraint aware sampling and MCMC sampling. Our experiments show that the new approaches lead to high quality query results with reasonable cost.