The word vectors learned by continuous space language models are known to have
the property that the vectors of synonymous words have cosine similarities close
to one. To date, however, the relation of antonymy has not been captured in these models. In this paper, we demonstrate that by incorporating prior information from a thesaurus as an extra term in the language model objective function, the induced word vectors are useful not only for language modeling, but also for modeling both synonymy and antonymy. The learned vectors have the property that the vectors of antonymous words tend to have cosine similarities close to negative one, while synonymous words retain similarity close to positive one. We show that with a small penalty in language model perplexity, the induced word vectors do very well on a standard GRE test of opposites.