Learning Monotonic Linear Functions

  • Adam Tauman Kalai

17th Annual Conference on Learning Theory, COLT 2004, Banff, Canada, July 1-4, 2004 |

Published by Springer Berlin Heidelberg

Publication | Publication | Publication

Learning probabilities (p-concepts [13]) and other real-valued concepts (regression) is an important role of machine learning. For example, a doctor may need to predict the probability of getting a disease P[y|x], which depends on a number of risk factors.

Generalized additive models [9] are a well-studied nonparametric model in the statistics literature, usually with monotonic link functions. However, no known efficient algorithms exist for learning such a general class. We show that regression graphs efficiently learn such real-valued concepts, while regression trees inefficiently learn them. One corollary is that any function E[y|x]=u(w · x) for umonotonic can be learned to arbitrarily small squared error ε in time polynomial in 1/ε, |w|1, and the Lipschitz constant of u (analogous to a margin). The model includes, as special cases, linear and logistic regression, as well as learning a noisy half-space with a margin [5, 4].

Kearns, Mansour, and McAllester [12, 15], analyzed decision trees and decision graphs as boosting algorithms for classification accuracy. We extend their analysis and the boosting analogy to the case of real-valued predictors, where a small positive correlation coefficient can be boosted to arbitrary accuracy. Viewed as a noisy boosting algorithm [3, 10], the algorithm learns both the target function and the asymmetric noise.