Conditional ML Estimation Using Rational Function Growth Transform

  • Ciprian Chelba ,
  • Alex Acero

Snowbird Learning Workshop, Proc. of the Snowbird Learning Workshop |

We present a study on conditional maximum likelihood (CML) estimation of probability models by means of a well known technique that generalizes the Baum-Eagon inequality [1] from polynomials to rational functions. The main advantage of the rational function growth transform (RFGT) method [5] is that it keeps the model parameter values — probabilities — properly normalized at each iteration. As a case study we apply the technique to discriminatively train a Na¨ıve Bayes (NB) classifier; the same procedure is at the basis of discriminative training of HMMs in speech recognition [6]. The NB model trained under the maximum likelihood (ML) and CML criteria, respectively, is used on a text classification problem. Smoothing is found to be a key component in increasing the classification accuracy. A simple modification of the algorithm increases the convergence speed significantly — as measured by likelihood and classification accuracy increase per iteration — over a straightforward implementation of RFGT. The model trained under the CML criterion achieves a relative improvement of 40% in classification accuracy over its ML counterpart [3]. The NB model can also be re-parameterized as a standard conditional exponential model encoutered in maximum entropy (MaxEnt) estimation [2]. Although the two parameterizations are in principle equivalent and should lead to the same model when trained under CML, the conditional exponential model estimated using improved iterative scaling (IIS) and smoothing with a Gaussian prior [4] outperforms the smoothed NB model estimated using RFGT when evaluated in terms of classification accuracy.