Prospect: Using Multiple Models to Understand Data

Kayur Patel, Steven Drucker, Ashish Kapoor, and Desney Tan

Abstract

A human's ability to diagnose errors, gather data, and generate features in order to build better models is largely untapped. We hypothesize that analyzing results from multiple models can help people diagnose errors by understanding relationships among data, features, and algorithms. These relationships might otherwise be masked by the bias inherent to any individual model. We demonstrate this approach in our Prospect system, show how multiple models can be used to detect label noise and aid in generating new features, and validate our methods in a pair of experiments.

VIDEO

PAPER

All

subject

publication

year

collaborators

Change Display

52 projects

*

Sorting and clustering large numbers of documents can be an overwhelming task: manual solutions tend to be slow, while machine learning systems often present results that don�t align well with users' intents. We created and evaluated a system for helping users sort large numbers of documents into clusters. iCluster has the capability to recommend new items for existing clusters and appropriate clusters for items. The recommendations are based on a learning model that adapts over time � as the user adds more items to a cluster, the system�s model improves and the recommendations become more relevant. Thirty-two subjects used iCluster to sort hundreds of data items both with and without recommendations; we found that recommendations allow users to sort items more rapidly. A pool of 161 raters then assessed the quality of the resulting clusters, finding that clusters generated with recommendations were of statistically indistinguishable quality. Both the manual and assisted methods were substantially better than a fully automatic method.

iCluster

Steven M. Drucker, Danyel Fisher, and Sumit Basu, Helping Users Sort Faster with Adaptive Machine Learning Recommendations, in Proceedings of Interact 2011, Springer, September 2011

Video

Paper

*

Interactive clustering refers to situations in which a human labeler is willing to assist a learning algorithm in automatically clustering items. We present a related but somewhat different task, assisted clustering, in which a user creates explicit groups of items from a large set and wants suggestions on what items to add to each group. While the traditional approach to interactive clustering has been to use metric learning to induce a distance metric, our situation seems equally amenable to classification. Using clusterings of documents from human subjects, we found that one or the other method proved to be superior for a given cluster, but not uniformly so. We thus developed a hybrid mechanism for combining the metric learner and the classifier. We present results from a large number of trials based on human clusterings, in which we show that our combination scheme matches and often exceeds the performance of a method which exclusively uses either type of learner.

iClusterTheory

Sumit Basu, Danyel Fisher, Steven M. Drucker, and Hao Lu, Assisting Users with Clustering Tasks by Combining Metric Learning and Classification, in Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI 2010), American Association for Artificial Intelligence , July 2010

Video

Paper

*

A human's ability to diagnose errors, gather data, and generate features in order to build better models is largely untapped. We hypothesize that analyzing results from multiple models can help people diagnose errors by understanding relationships among data, features, and algorithms. These relationships mig

Details

Publication typeInproceedings
Published inIJCAI 2011
PublisherACM
> Publications > Prospect: Using Multiple Models to Understand Data