Charles Sutton and Tom Minka
Because maximum-likelihood training is intractable for general factor graphs, an appealing alternative is local training, which approximates the likelihood gradient without performing global propagation on the graph. We discuss two new local training methods: shared-unary piecewise, in which unary factors are shared among every higher-way factor that they neighbor, and the one-step cutout method, which computes exact marginals on overlapping subgraphs. Comparing them to naive piecewise training, we show that just as piecewise training corresponds to using the Bethe pseudomarginals after zero BP iterations, shared-unary piecewise corresponds to the pseudomarginals after one parallel iteration, and the one-step cutout method corresponds to the beliefs after two iterations. We show in simulations that this point of view illuminates the errors made by shared-unary piecewise.