Semantics Sensitive Sampling for Probabilistic Programs

MSR-TR-2013-109 |

We present a new semantics sensitive sampling algorithm for probabilistic programs, which are “usual” programs endowed with statements to sample from distributions, and condition executions based on observations. Since probabilistic programs are executable, sampling can be performed by repeatedly executing them. However, in the case of programs with a large number of random variables and observations, naive execution does not produce high quality samples, and it takes an intractable number of samples in order to perform reasonable inference. Our MCMC algorithm called S3 tackles these problems using ideas from program analysis. First, S3 propagates observations back through the program in order to obtain a semantically equivalent program with conditional sample statements – this has the effect of preventing rejections due to executions that fail to satisfy observations. Next, S3 decomposes the probabilistic program into a set of straight-line programs, one for every valid program path, and performing Metropolis-Hastings sampling over each straight-line program independently. Sampling over straight-line programs has the advantage that random choices from previous executions can be re-used merely using the program counter (or line number) associated with each random choice. Finally, it combines the results from sampling each straight-line program (using appropriate weighting) to produce a result for the whole program. We formalize the semantics of probabilistic programs and rigorously prove the correctness of S3. We also empirically demonstrate the effectiveness of S3, and compare it with an importance sampling based tool over various benchmarks.