An interesting subset of problems in the field of computer vision require the inference of a continuous valued quantity from image data. This dissertation describes the visual inference machine (VIM), a general method for learning the mapping from image data to a continuous output space using the Bayesian rules of inference. The learning is performed without needing to define a generative model of image formation, the benefit of which being increased speed of inference for real-time applications. The disadvantage of this method is that a set of training data is needed, from which the VIM learns the mapping, and such data can be costly to collect and label. Therefore, an extension to the VIM is also introduced (the semi-supervised visual inference machine, or SS-VIM) which does not require the training data to be fully labelled. The issue of how best to filter an image for optimal inference is also covered, and it is shown that the VIM or SS-VIM can easily learn mappings using a mixture of image features and automatically select those that are most useful. The VIM and SS-VIM are demonstrated for visual region tracking, in human–computer interaction (e.g, gaze tracking; gesture-based interfaces) and for mapping images to points on a manifold. Lastly, this dissertation addresses the issue of outlying observations, both on a per-image and per-pixel basis. In this latter case the variational Ising classifier or VIC algorithm is developed which considers a prior over outlying pixels that models their spatial coherence properties.