Christopher M. Bishop

Neural Networks and Machine Learning


Neural Networks and Machine Learning

1997 NATO Advanced Study Institute

Christopher M. Bishop (Editor)

Springer (1998)

The contents pages and list of contributors are available here in PDF format.


Preface

From July to December 1997, the Isaac Newton Institute for Mathematical Sciences in Cambridge was host to a major international programme entitled Neural Networks and Machine Learning. Many of the world's leading researchers in the field participated for periods ranging from a few weeks up to six months, and numerous younger scientists also benefited from a variety of conferences and workshops held throughout the programme. The Newton Institute's purpose-designed building provided a superb research environment, as well as an excellent venue for workshops.

The first workshop of the six month programme was a two-week NATO Advanced Study Institute on Generalization in Neural Networks and Machine Learning. This was heavily over-subscribed and attendance
was limited to around 90 by the capacity of the Institute as well as by the desire to maintain an informal, interactive atmosphere. The topic of generalization was chosen as a focal point for the workshop and provided a common theme running through many of the presentations. This book resulted directly from the NATO ASI, and many of the
chapters have a significant tutorial component, reflecting the instructional aims of the workshop.

Part 1 of the book, Statistical Foundations, deals with statistical principles and theoretical analyses which underpin current research in neural networks and machine learning, as well as with techniques for the assessing the performance of pattern recognition systems. The first chapter, by Ripley, reviews several approaches to the assessment of generalization performance from both theoretical and practical viewpoints. Breiman then discusses the decomposition of the sum-of-squares error into bias and variance components, and uses simple examples to illustrate the insight which this decomposition can provide into the problem of generalization and model complexity optimization. Next Sontag introduces the Vapnik Chervonenkis (VC) dimension, a measure of the capacity of a class of functions, and shows how this quantity can be computed in the context of binary classification problems for various neural network models. Buhmann and Tishby then apply computational learning theory to the analysis of clustering algorithms, leading to a criterion for determining cluster splits. The first part of the book is concluded by Neal who discusses the empirical assessment of learning algorithms through the framework of the DELVE project, and illustrates this framework with an application to the technique of automatic relevance determination.

Part 2 of the book, Algorithms and Architectures, surveys a variety of current approaches to pattern recognition. MacKay gives an introductory tutorial on the increasingly popular formalism of Gaussian processes, discussing relations to earlier techniques, the choice and adaptation of the covariance function, and applications to regression and classification. Zhu, Williams, Rohwer and Morciniec then present an analysis of Gaussian process regression, showing how the optimal finite-dimensional model under a Gaussian process prior can be expressed in terms of an infinite-dimensional principal component decomposition. The next two chapters deal with variational techniques. Jaakkola and Jordan introduce the framework of variational methods for approximate inference in dense graphical
models, illustrating the technique using a medical diagnostic database. Variational methods are then applied to the problem of learning in neural networks by Barber and Bishop, who demonstrate a tractable solution for general Gaussian approximations to the posterior distribution over parameters. Vapnik then introduces the support vector technique for regression and classification and for solving linear operator equations, demonstrating that good results can be obtained using finite data sets in spaces of very high dimensionality. Finally, a very different viewpoint is adopted by Baum who discusses an economic model of intelligence in which interacting agents partition and solve complex problems.

This book owes much to the lecturers at the NATO ASI, and I would like to thank them for their contributions to the workshop as well as for making the additional effort needed to prepare this volume. I would also like to thank Joachim Buhmann, Geoffrey Hinton and Michael Jordan and for their help in organizing the NATO ASI, as well as David Haussler, Geoffrey Hinton, Mahesan Niranjan and Leslie Valiant for their assistance in running the overall six month Newton Institute programme.

I am also grateful to Tim Perkins from the Department of Applied Mathematics and Theoretical Physics at Cambridge for his contributions to the typesetting of this book in LaTeX.

Finally, I would like to express my sincere thanks to the staff of the Isaac Newton Institute, for their energy, enthusiasm and support throughout the six month programme.

Christopher M. Bishop
August 1998

[back to top]

[return to homepage]