Bayesian Inference of Grammars

Mark Johnson (Joint work with Sharon Goldwater and Tom Griffiths)
Even though Maximum Likelihood Estimation (MLE) of Probabilistic
Context-Free Grammars (PCFGs) is well-understood (the Inside-Outside
algorithm can do this efficiently from the terminal strings alone) the
inferred grammars are usually linguistically inaccurate. In order to
better understand why maximum likelihood finds poor grammars, this
talk examines two simple natural language induction problems:
morphological segmentation and word segmentation. We identify several
problems with the MLE PCFG models of these problems and propose
Hierarchical Dirichlet Process (HDP) models to overcome them. In
order to test these HDP models we develop MCMC algorithms for Bayesian
inference of these models from strings alone. Finally, we discuss to
what extent the lessons learnt from these examples can be put into a
unified framework and applied to the general problem of grammar
induction.

Speaker Details

Currently Professor of Cognitive and Linguistic Sciences and Computer Science, Brown University1987 PhD Stanford University, post-doc MIT2003 President of the Association for Computational Linguistics2006-2007 Visiting researcher, Microsoft Research

Date:
Speakers:
Mark Johnson
Affiliation:
Brown University
    • Portrait of Jeff Running

      Jeff Running