Learning with Uncertainty – Gaussian Processes and Relevance Vector Machines

  • Joaquin Quiñonero Candela

This thesis is concerned with Gaussian Processes (GPs) and Relevance Vector Machines (RVMs), both of which are particular instances of probabilistic linear models. We look at both models from a Bayesian perspective, and are forced to adopt an approximate Bayesian treatment to learning for two reasons. The fi rst reason is the analytical intractability of the full Bayesian treatment and the fact that we in principle do not want to resort to sampling methods. The second reason, which incidentally justifi es our not wanting to sample, is that we are interested in computationally efficient models. Computational efficiency is obtained through sparseness: sparse linear models have a signi ficant number of their weights set to zero. For the RVM, which we treat in Chap. 2, we show that it is precisely the particular choice of Bayesian approximation that enforces sparseness. Probabilistic models have the important property of producing predictive distributions instead of point predictions. We also show that the resulting sparse probabilistic model implies counterintuitive priors over functions, and ultimately inappropriate predictive variances; the model is more certain about its predictions, the further away from the training data. We propose the RVM*, a modi fied RVM that provides signi ficantly better predictive uncertainties. RVMs happen to be a particular case of GPs, the latter having superior performance and being non-sparse non-parametric models. For completeness, in Chap. 3 we study a particular family of approximations to Gaussian Processes, Reduced Rank Gaussian Processes (RRGPs), which take the form of fi nite extended linear models; we show that GPs are in general equivalent to in finite extended linear models. We also show that RRGPs result in degenerate GPs, which suff er, like RVMs, of inappropriate predictive variances. We solve this problem in by proposing a modifi cation of the classic RRGP approach, in the same guise as the RVM*. In the last part of this thesis we move on to the problem of uncertainty in the inputs. Indeed, these were until now considered deterministic, as it is common use. We derive the equations for predicting at an uncertain input with GPs and RVMs, and use this to propagate the uncertainty in recursive multi-step ahead time-series predictions. This allows us to obtain sensible predictive uncertainties when recursively predicting k-steps ahead, while standard approaches that ignore the accumulated uncertainty are way overconfi dent. Finally we explore a much harder problem: that of training with uncertain inputs. We explore approximating the full Bayesian treatment, which implies an analytically intractable integral. We propose two preliminary approaches. The first one tries to \guess” the unknown \true” inputs, and requires careful optimisation to avoid over- fitting. It also requires prior knowledge of the output noise, which is limiting. The second approach consists in sampling from the inputs posterior, and optimising the hyperparameters. Sampling has the eff ect of severely incrementing the computational cost, which again is limiting. However, the success in toy experiments is exciting, and should motivate future research.