Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Affect detection in tweets

The case for automatic affect detection:

Detecting affect in free text has a wide range of possible applications:

  • What are the positive and negative talking points of your customers?
  • What opinions are out there on products and services (on Twitter, Facebook, in product reviews etc)?
  • How does mood and sentiment trend over time, geography and populations?

Similarly, there are different techniques to automatically detect affect: some systems use hand-curated word lists of positive and negative opinion terms, others use statistical models that are trained on opinion-heavy text. The challenge is to come up with a system that works reasonably well across various domains and types of content. In other cases, though, it would be better to use a classifier specific to a particular task, in which case the challenge is in creating, or finding, enough annotated text to train a classifier.

Recently we have conducted a study based on the psychological literature where we identified over 150 different mood hashtags that people use on Twitter. We mapped these hashtags into positive and negative affect and used them as a training signal to identify affect from the tweet text. We collected nearly four million tweets from a span of one year and trained a text classifier on this data.

How the classifier works:

The classifier is trained on text with known affect (positive or negative). For each such text, words and word pairs are extracted and counted. At training time, the classification algorithm (maximum entropy classifier) assigns numerical weights to the words an word pairs depending on how strongly they correlate with positive or negative opinion. At runtime, a new text is passed in and words and word pairs are extracted from the new text. These are passed into the classifier, the weights for the words/pairs are looked up and combined in a classifier-specific mathematical formulation, and the output is a prediction (positive or negative) and a probability.

Training time: