Kaushik Chakrabarti, Surajit Chaudhuri, and Seung-won Hwang
Exploratory ad-hoc queries could return too many answers – a
phenomenon commonly referred to as “information overload”.
In this paper, we propose to automatically categorize the results
of SQL queries to address this problem. We dynamically generate
a labeled, hierarchical category structure – users can determine
whether a category is relevant or not by examining
simply its label; she can then explore just the relevant categories
and ignore the remaining ones, thereby reducing information
overload. We first develop analytical models to estimate
information overload faced by a user for a given exploration.
Based on those models, we formulate the categorization problem
as a cost optimization problem and develop heuristic algorithms
to compute the min-cost categorization.
In ACM SIGMOD Conference