﻿ The R2 Probabilistic Programming System - Microsoft Research
R2: A Probabilistic Programming System

# What is R2?

R2 is a probabilistic programming system that uses powerful techniques from program analysis and verification for efficient Markov Chain Monte Carlo (MCMC) inference. The language that is used to describe probabilistic models in R2 is based on C#. R2 compiles the given model into executable code to generate samples from the posterior distribution. The inference algorithm currently implemented in R2 is a variation of the Metropolis-Hastings sampling algorithm.

# Examples

Let us see how to use R2 for performing inference on probabilistic models by means of an example. We’ll consider the famous Monty Hall problem for this. The Monty Hall problem can be stated as follows:

Suppose you’re on a game show, and you’re given the choice of three doors. Behind one door is a car, behind the others, goats. You pick a door, say #1, and the host, who knows what’s behind the doors, opens another door, say #3, which has a goat. He says to you, "Do you want to pick door #2?" Is it to your advantage to switch your choice of doors?

- Marilyn vos Savant, Parade Magazine, 1990

 (Source: xkcd.com)

We need to find the probabilities of the contestant winning the car when they don’t change their choice and also when they change their choice. For this, we would have to construct a model. This model can be represented in R2 by means of the following program:

``` 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 using MicrosoftResearch.R2Lib; using MicrosoftResearch.R2Lib.Distributions; class MontyHall { public Tuple<bool, bool> Infer() { double[] probabilities = new double[] { 1.0/3, 1.0/3, 1.0/3 }; // Set up the game. int carDoorNum = Categorical.Sample(probabilities); // Contestant picks a door. int chosenDoorNum = Categorical.Sample(probabilities); // Host picks a door. int[] possibleOpenDoors; int openedDoorNum = -1; int i, j; if (carDoorNum == chosenDoorNum) { possibleOpenDoors = new int[2]; j = 0; for (i = 0; i < 3; ++i) { if (i != carDoorNum) { possibleOpenDoors[j] = i; ++j; } } j = Categorical.Sample(new double[] { 0.5, 0.5 }); openedDoorNum = possibleOpenDoors[j]; } else { for (i = 0; i < 3; ++i) { if ((i != carDoorNum) && (i != chosenDoorNum)) { openedDoorNum = i; } } } bool chosenDoorContainsCar = false, otherDoorContainsCar = false; // Does contestant win if s/he doesn't switch? chosenDoorContainsCar = (chosenDoorNum == carDoorNum); // Does contestant win if s/he switches? for (i = 0; i < 3; ++i) { if ((i != chosenDoorNum) && (i != openedDoorNum)) { otherDoorContainsCar = (i == carDoorNum); } } return new Tuple<bool, bool>(chosenDoorContainsCar, otherDoorContainsCar); } } ```

In line 11, we initialize the game by choosing the door behind which there would be a car. Behind the other two doors, there would be goats. This is done by randomly picking a door for the car. Since each door has equal probability of getting picked, this is modeled by drawing a sample from a Categorical distribution with 3 values. The probability of picking each value is 1/3.

Next, in line 14, we model the contestant picking a door. Again, each door has equal probability of getting picked. As previously, this is specified by drawing a sample from a Categorical distribution with 3 values where the probability of picking each value is 1/3.

In lines 17-46, we make the host open one of the other two doors behind which there is a goat. If the contestant has chosen the door behind which the car is present, then the host picks one of the other two doors with equal probability. Otherwise, there is only one door behind which there is a goat and the host picks this door.

Finally, we check whether the contestant wins the car by not changing their choice or by changing it. These values are stored in the boolean variables chosenDoorContainsCar and otherDoorContainsCar. Both these variables are returned in a tuple in line 62.

Now, let us use R2 to perform inference on this model. In a command prompt, type the following:

``` > R2 MontyHall.cs /numSamples:1000000```

R2 would generate samples from the given model and compute the mean and variance of the variables of interest from the generated samples. In this case, R2 would print the mean and variance of each element of the returned tuple. You should get an output similar to the following:

``` <0> Mean: 0.332854 <0> Variance: 0.222062 <1> Mean: 0.667146 <1> Variance: 0.222062```

As is clear from the above output, the contestant has a higher chance of winning the car if they change their choice.

More examples can be found in the examples/ directory of the R2 installation location.

For more information on using R2, please refer to the R2 user guide in the docs/ directory.

# Limitations

R2 currently has the following limitations which we are working towards fixing in future releases:

1. Metropolis-Hastings sampling algorithm is the only inference algorithm that is currently available.
2. The convergence characteristics of the Metropolis-Hastings algorithm depend upon the proposal distribution that is used. We use a heuristic to tune the proposal distribution. There is no facility that allows the user to tune the proposal distribution.
3. Multi-variate distributions are not supported.

# Team

### Collaborators

• Johannes Borgström (Uppsala University)
• Andy Gordon (MSR Cambridge)
• Gil Hur (Seoul National University)
• Samin Ishtiaq (MSR Cambridge)
• Claudio Russo (MSR Cambridge)

### Interns

• Arun Tejasvi Chaganty (Stanford University)
• Guillaume Claret (ENS Paris)
• Vijay D'Silva (UC Berkeley)
• Christian von Essen (Verimag)
• Sulekha Kulkarni (Georgia Tech)
• Sherjil Ozair (IIT Delhi)
• Robert J. Simmons (CMU)

# Papers

• Andrew D. Gordon, Thomas A. Henzinger, Aditya V. Nori, and Sriram K. Rajamani. Probabilistic programming. In ICSE '14: International Conference on Software Engineering (FOSE track). May 2014.

Our other papers on probabilistic programming are listed below:

• Aditya V. Nori, Sherjil Ozair, Sriram K. Rajamani, and Deepak Vijaykeerthy. Efficient Synthesis of Probabilistic Programs. In PLDI '15: Programming Languages Design and Implementation. June 2015.
• Chung-Kil Hur, Aditya V. Nori, Sriram K. Rajamani, and Selva Samuel. Slicing Probabilistic Programs. In PLDI '14: Programming Languages Design and Implementation. June 2014.
• Guillaume Claret, Sriram K. Rajamani, Aditya V. Nori, Andrew D. Gordon and Johannes Borgström. Bayesian Inference Using Data Flow Analysis. In ESEC-FSE '13: Foundations of Software Engineering. August 2013.
• Arun T. Chaganty, Aditya V. Nori, and Sriram K. Rajamani. Efficiently Sampling Probabilistic Programs via Program Analysis. In AISTATS '13: Artificial Intelligence and Statistics. April 2013.
• Andrew D. Gordon, Aditya V. Nori, and Sriram K. Rajamani. Probabilistic Inference using Program Analysis. In OBT '13: Off the Beaten Track Workshop. January 2013.
• Andrew D. Gordon, Mihhail Aizatulin, Johannes Borgström, Guillaume Claret, Thore Graepel, Aditya V. Nori, Sriram K. Rajamani, and Claudio Russo. A Model-Learner Pattern for Bayesian Reasoning. In POPL '13: Principles of Programming Languages. January 2013.

# Contact

If you face any issues using R2, or have questions/comments regarding it, please feel free to contact us at: r2pp AT Microsoft DOT com.