Chengtao Li, Jianwen Zhang, Jian-Tao Sun, and Zheng Chen
This paper deals with the problem of jointly mining topics, sentiments, and the association between them from online reviews in an unsupervised way. Previous methods often treat a sentiment as a special topic and assume a word is generated from a flat mixture of topics, where the discriminative performance of sentiment analysis is not satisfied. A key reason is that providing rich priors on the polarity of a word for the flat mixture is difficult as the polarity often depends on the topic. To solve the problem we propose a novel model. We decompose the generative process of a word's sentiment polarity to a two-level hierarchy: the first level determines whether a word is used as a sentiment word or just an ordinary topic word, and the second level (if the word is used as a sentiment word) determines the polarity of it. With the decomposition, we provide separate prior for the first level to encourage the discrimination between sentiment words and ordinary topic words. This prior is relatively easy to obtain compared to the concrete prior of the word polarities. We construct the prior based on part-of-speech tags of words and embed the prior into the model. Experiments on four real online review data sets show that our model consistently outperforms previous methods in the task of sentiment analysis, and simultaneously performs well in the sub-tasks of discovering ordinary topics, sentiment-specific topics, and extracting topic-specific sentiment words.
|Published in||SIAM International Conference on Data Mining (SDM'13)|
|Publisher||Society for Industrial and Applied Mathematics|