Polarity Classification of Celebrity Coverage in the Chinese Press

  • Benjamin K.Y. Tsou ,
  • Raymond W.M. Yuen ,
  • Oi Yee Kwong ,
  • Tom B.Y. Lai ,
  • Wei Lung Wong

Proceeding of the International Conference on Intelligence Analysis |

The importance of media opinion on strategic subjects has long been recognized, and increasingly so for the foreign press. Recent interests in the automatic classification of summative views have focused on nglish (Wilson and Wiebe, 2003; Wilson et al., 2003) and have yielded fruitful results (Wiebe and Riloff, 2005). Interests in classifying polarity in the Chinese Press is only beginning (Yuen et al,. 2004). This paper reports on linguistic issues relevant to polarity in Chinese texts and an experimental annotation scheme, the derivation of polarity indices from annotated textual attributes, as well as results obtained from a comparison of manual and automatic classification efforts. Manual classification was performed by 4 trained scorers who used a scale of -5 to 5 to indicate the extent of positive or negative polarity of each news article. There were 600 articles mainly on U.S. and Taiwan presidential elections in 2004. Normalized results on the polarity of the news articles regarding 4 well-known political figures (i.e., John Kerry, George W Bush, Junichiro Koizumi and Chen Shui-bian) show wide variations in the different Chinese communities, represented by Beijing, Hong Kong, Shanghai and Taipei. A system to automatically score the polarity of the sample news articles is introduced, which incorporates initially a set of pre-selected polar lexical items along the lines of Yuen et al. (2004). The Spread, Density and Intensity of polar lexical items are explored. Spread indicates the extent to which the polar lexical items spread over an article in terms of paragraphs or sentences. Density means how extensive polar lexical items are found in the polar paragraphs. Intensity refers to the strength of the polarity of the lexical items. The possible contributions by Spread and Density to improve correlation with manual classification are explored in this paper.