Hierarchical Dirichlet Processes

We propose the hierarchical Dirichlet process (HDP), a hierarchical, nonparametric, Bayesian model for clustering problems involving multiple groups of data. Such grouped clustering problems occur often in practice, e.g. in the problem of topic discovery in document corpora (Hoffman 1999, Blei et al 2003). Each group of data is modeled with a mixture, with the number of components being open-ended and inferred automatically by the model. Further, components can be shared across groups, allowing dependencies across groups to be modeled effectively as well as conferring generalization to new groups. HDPs are a principled solution to the grouped clustering problem, allowing a variety of different representations, and allowing for many possibilities for generalization. We report experimental results on three text corpora showing the effective and superior performance of the HDP over previous models.

Technical Report: Hierarchical Dirichlet processes. Teh, Jordan, Beal and Blei (2004). UC Berkeley Department of Statistics, TR 653. Can be obtained at: http://www.cs.berkeley.edu/~ywteh/research/npbayes

Speaker Details

Yee Whye Teh is a postdoctoral fellow at UC Berkeley working with Prof. Michael Jordan. His work focuses on statistical models for machine learning, and applications of such models to computer vision. He obtained his PhD from the University of Toronto in 2003 under the supervision of Prof. Geoffrey Hinton, during which he spent 2 years at the Gatsby Computational Neuroscience Unit in London.

Date:
Speakers:
Yee Whye Teh
Affiliation:
UC Berkeley
    • Portrait of Jeff Running

      Jeff Running