Adding Domain Knowledge to Latent Topic Models

Around the turn of the century, a favorite pastime in machine learning was to inject various forms of domain knowledge into clustering. Examples include the must-links, where two items must be in the same cluster, and the cannot-links, where they cannot be in the same cluster. Collectively known as constrained clustering, it produced more relevant clusters for domain experts. Fast forward a decade, a new favorite pastime is to inject various forms of domain knowledge into Latent Dirichlet Allocation.

The goal is to constrain the latent topic assignment of each word, so that latent topic modeling is informed by both data and domain knowledge, and the resulting topics are more relevant for domain experts. We present a few examples that our group has worked on, starting from the simple topic-in-set knowledge where the latent topic of a word is constrained within a small set of candidate topics, to Dirichlet Forest which allows must-links and cannot-links on topics while maintaining conjugacy for efficient inference, to a general framework named Fold.all. Fold.all allows domain experts to express arbitrary knowledge in human-friendly First-Order Logic, and combines it with data using stochastic optimization. This approach enables domain experts to focus on high-level modeling goals instead of the low-level issues involved in creating a custom topic model.

Speaker Details

Xiaojin (Jerry) Zhu is an assistant professor of computer science at the University of Wisconsin-Madison, with affiliate appointments in Electrical and Computer Engineering and Psychology. Dr. Zhu’s research is in machine learning. He is interested in semi-supervised learning, computational cognitive science, and natural language processing. Dr. Zhu received his Ph.D. in language technologies from School of Computer Science at Carnegie Mellon University in 2005, and MS and BS in computer science from Shanghai Jiao Tong University in 1996 and 1993. He was a research staff member at IBM China Research Laboratory in 1996-98. He was a recipient of the National Science Foundation CAREER award in 2010.

Date:
Speakers:
Jerry Zhu
Affiliation:
University of Wisconsin-Madison
    • Portrait of Jeff Running

      Jeff Running