Harvesting and Summarizing User-Generated Content for Advanced Speech-Based HCI

JJ (Jingjing) Liu

Harvesting and Summarizing User-Generated Content for Advanced Speech-Based HCI

JJ (Jingjing) Liu

IEEE Journal of Selected Topics in Signal Processing | December 2012 , Vol 6(8)

Download BibTex

There are many Web-based platforms where people could share user-generated content such as reviews,posts, blogs, and tweets. However, online communities and social networks are expanding so rapidly that it is impossible for people to digest all the information. To help users obtain information more efficiently, both the interface for data access and the information representation need to be improved. An intuitive and personalized interface, such as a dialogue system, could be an ideal assistant, which engages a user in a continuous dialogue to garner the user’s interest, assists the user via speech-navigated interactions, harvests and summarizes the Web data as well as presenting it in a natural way. This work, therefore, aims to conduct research on a universal framework for developing speech-based interface that can aggregate user-generated content and present the summarized information via speech-based human-computer interactions. The challenge is two-fold. Firstly, how to interpret the semantics and sentiment of user-generated data and aggregate them into structured yet concise summaries? Secondly, how to develop a dialogue modeling mechanism to present the highlighted information via natural language? This work explores plausible approaches to tackling these challenges. We will investigate a parse-and-paraphrase paradigm and a sentiment scoring mechanism for information extraction from unstructured user-generated content. We will also explore sentiment-involved opinion summarization and dialogue modeling approaches for aggregated information representation. A restaurant-domain prototype system has been implemented for demonstration.