Qiang Hao, Rui Cai, Changhu Wang, Rong Xiao, Jiang-Ming Yang, Yanwei Pang, and Lei Zhang
25 April 2010
With the prosperity of tourism and Web 2.0 technologies, more and more people have willingness to share their travel experiences on the Web (e.g., weblogs, forums, or Web 2.0 communities). These so-called travelogues contain rich information, particularly including location-representative knowledge such as attractions (e.g., Golden Gate Bridge), styles (e.g., beach, history), and activities (e.g., diving, surfing). The location-representative information in travelogues can greatly facilitate other tourists’ trip planning, if it can be correctly extracted and summarized. However, since most travelogues are unstructured and contain much noise, it is difficult for common users to utilize such knowledge effectively. In this paper, to mine location-representative knowledge from a large collection of travelogues, we propose a probabilistic topic model, named as Location-Topic model. This model has the advantages of (1) differentiability between two kinds of topics, i.e., local topics which characterize locations and global topics which represent other common themes shared by various locations, and (2) representation of locations in the local topic space to encode both location-representative knowledge and similarities between locations. Some novel applications are developed based on the proposed model, including (1) destination recommendation for on flexible queries, (2) characteristic summarization for a given destination with representative tags and snippets, and (3) identification of informative parts of a travelogue and enriching such highlights with related images. Based on a large collection of travelogues, the proposed framework is evaluated using both objective and subjective evaluation methods and shows promising results.
In Proceedings of the 19th International World Wide Web Conference (WWW 2010)
Publisher Association for Computing Machinery, Inc.
Copyright © 2007 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or email@example.com. The definitive version of this paper can be found at ACM’s Digital Library --http://www.acm.org/dl/.