Web servers on the Internet need to maintain high reliability, but the cause
of intermittent failures of web transactions is non-obvious. We use approximate
Bayesian inference to diagnose problems with web services. This diagnosis problem
is far larger than any previously attempted: it requires inference of
10^{4} possible faults from 10^{5} observations. Further, such
inference must be performed in less than a second. Inference can be done at
this speed by combining a mean-field variational approximation and the use of
stochastic gradient descent to optimize a variational cost function. We use this
fast inference to diagnose a time series of anomalous HTTP requests taken from a
real web service. The inference is fast enough to analyze network logs with
billions of entries in a matter of hours.