The Linear Algebraic Structure of Word Meanings

Word embeddings are often constructed with discriminative models such as deep nets and word2vec. Mikolov et al (2013) showed that these embeddings exhibit linear structure that is useful in solving “word analogy tasks”. Subsequently, Levy and Goldberg (2014) and Pennington et al (2014) tried to explain why such linear structure should arise in embeddings derived from nonlinear methods.

We provide a new generative model “explanation” for various word embedding methods as well as the above-mentioned linear structure. It also gives a generative explanation of older vector space methods such as the PMI method of Church and Hanks (1990). The model has surprising predictions (e.g., the spatial isotropy of word vectors), which are empirically verified. It also directly leads to a linear algebraic understanding of how a word embedding behaves when the word is polysemous (has multiple meanings), and to recover the different meanings from the embedding.

This methodology and generative model may be useful for other NLP tasks and neural models.

Joint work with Sanjeev Arora, Yuanzhi Li, Yingyu Liang, and Andrej Risteski (listed in alphabetical order).

Speaker Details

Tengyu Ma is currently a fourth-year graduate student at Princeton University, advised by Prof. Sanjeev Arora. He received the Simons Award for Graduate Students in Theoretical Computer Science in 2014 and IBM Ph.D. Fellowship in 2015. Ma’s work seeks to develop efficient algorithms with provable guarantees for machine learning problems. His research interests include non-convex optimization, deep learning, natural language processing, distributed optimization, and convex relaxation (e.g. sum of squares hierarchy) for machine learning problems.

Date:
Speakers:
Tengyu Ma
    • Portrait of Jeff Running

      Jeff Running

Series: Microsoft Research Talks