SEER: MAXIMUM LIKELIHOOD REGRESSION
FOR LEARNING-SPEED CURVES
BY
CARL MYERS KADIE
B.S., University of Illinois, 1985
M.S., University of Illinois, 1989
THESIS
Submitted in partial fulfillment of the
requirements
for the degree of Doctor of Philosophy
in Computer Science
in the Graduate College of the
University of Illinois at Urbana-Champaign, 1995
Urbana, Illinois
Seer: Maximum Likelihood Regression
for Learning-Speed Curves
Carl
Myers Kadie
Department of Computer Science
University of Illinois at Urbana-Champaign, 1995
David C. Wilkins, Advisor
The research presented here focuses on modeling machine-learning performance. The thesis introduces Seer, a system that generates empirical observations of classification-learning performance and then uses those observations to create statistical models. The models can be used to predict the number of training examples needed to achieve a desired level and the maximum accuracy possible given an unlimited number of training examples. Seer advances the state of the art with 1) models that embody the best constraints for classification learning and most useful parameters, 2) algorithms that efficiently find maximum-likelihood models, and 3) a demonstration on real-world data from three domains of a practicable application of such modeling.
The first part of the thesis gives an overview of the requirements for a good maximum-likelihood model of classification-learning performance. Next, reasonable design choices for such models are explored. Selection among such models is a task of nonlinear programming, but by exploiting appropriate problem constraints, the task is reduced to a nonlinear regression task that can be solved with an efficient iterative algorithm. The latter part of the thesis describes almost 100 experiments in the domains of soybean disease, heart disease, and audiological problems. The tests show that Seer is excellent at characterizing learning-performance and that it seems to be as good as possible at predicting learning performance. Finally, recommendations for choosing a regression model for a particular situation are made and directions for further research are identified.
Acknowledgments
Thanks to Nanci, my wife, for encouragement, support, patience, and love. Thanks to Benjamin, my son, for an extra boost of motivation. Thanks, also, to family and friends for their encouragement.
Thanks to my thesis advisor David C. Wilkins for his guidance. Thanks to James Edstrom, Ziad Najem, and Gunner Blix for reviewing drafts of this thesis. Thanks also to my thesis committee.
This research was conducted at the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Support was provided by the Fannie and John Hertz Foundation. This research was also supported in part by AFOSR grant F49260-92-J-0545, ONR grant N00014-88-K-0124, and ONR grant N00014-94-1-0432.
Table of Contents
CHAPTER
1. Introduction...............................................................................................................
1.1. Inductive Classification Learning........................................................................
1.2. Regression on Inductive Learning.......................................................................
1.3. Overview of Thesis...........................................................................................
2. Related Work............................................................................................................
2.1. Theoretical Approaches: Computational Learning Theory....................................
2.2. Empirical Approaches......................................................................................
2.3. Effect of Skew and Multiple Classes on Learning Performance..........................
2.4. Summary........................................................................................................
3. Overview of Learning-Performance Models..............................................................
3.1. Good-Fitting Models of Learning-Performance..................................................
3.2. Generalized Cross-Validation...........................................................................
3.3. Conclusion......................................................................................................
4. Candidate Models of Learning Performance: Design and Selection Method.................
4.1. Candidate Deterministic Models.......................................................................
4.2. Modeling the Effect of Multiple Classes, Skewed Classes, and Noise..................
4.3. Nondeterministic Design Choices.....................................................................
4.4. Fitting Models to Data Efficiently.....................................................................
4.5. Summary........................................................................................................
5. Experimental Procedure and Results.........................................................................