Infer.NET user guide :
Tutorials and examples ## Difficulty versus abilityThis example is a model of how people answer questions on a multiple choice test. It explicitly models the trade-off between a person's ability and the difficulty of the question. The model also allows you to estimate the correct answer to each question, which is useful for crowdsourcing and generalizes the approach of majority voting. This model was used in the paper "How To Grade a Test Without Knowing the Answers --- A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing" by Bachrach et al (ICML 2012), where it was called the DARE model. You can run this example in the Examples Browser. In this model, there are multiple subjects who answer multiple questions, each having multiple choices. The data is simply an integer for each subject and question, describing the answer that was chosen. The following variables set this up:
To explain the data, we introduce four different latent variables. For
each subject, we hypothesize a real-valued
For each question, we hypothesize a real-valued
Besides difficulty, a question may have high or low discrimination between
people of different abilities. For example, a question that is badly
worded may be misinterpreted by a fraction of the subjects, leading to noisy
answers regardless of the subject's ability. This is captured by a
real-valued
Finally, each question has an integer-valued
The generative model now works as follows. For each subject and
question, the difference of ability and difficulty is the subject's
To get robust inference in this model, some special settings are necessary, otherwise it tends to generate improper message exceptions. The issue is that the model has highly correlated variables, yet we are using a factorized distribution to approximate it (see the page on Expectation Propagation). This leads to slow and unstable convergence. To help convergence we instruct the scheduler to process subjects sequentially, so that all variables are updated after each subject, i.e. 40 times per iteration, rather than once per iteration. A nice benefit of these settings is that the inference converges rather quickly (less than 5 iterations).
To test the inference under this model, we generate a data set from known parameters and compare the learned parameters to the true ones. Notice that the Sample method has the same structure as the Infer.NET model. This happens because the Infer.NET model essentially is a sampler but expressed using the Infer.NET primitives instead of C#. The results are shown below. The estimated true answers and difficulty/ability parameters are pretty good. The discrimination parameters are not quite as good.
Note that if the ability parameters are all equal, then the estimate of the true answers will be identical to majority voting, since the most likely true answer will be the answer that most subjects chose. Thus to compare the results of this model to majority voting, just set the ability parameters to a constant. If you do this on this dataset, only 91% of the estimated trueAnswers are correct. Thus the ability parameters help to do better vote aggregation. ## How to handle missing dataThe provided code assumes that every subject has answered every question. If this is not the case, then some changes are necessary. One approach is to leave the response array unobserved and apply constraints to the individual elements that were observed. Another approach is use conditionals to skip over the missing elements, as explained in How to handle missing data. However both of these are inefficient. The most efficient approach is to restructure the data as a collection of (subject, question, response) observations. Instead of looping over all subjects and questions, you only loop over the provided observations. The model becomes:
For an example of this approach, see the forum. |