#TAIA 2012 Accepted Papers

Query-Specific Recency Ranking: Survival Analysis for Improved Microblog Retrieval

M. Efron

Abstract: We offer a preliminary application of survival analysis to the problem of ranking documents in a time-aware setting. Specifically, we focus on microblog retrieval where recency is often positively correlated with relevance. We propose a method of temporal retrieval based on estimation of an exponential distribution’s parameter based on partial maximum likelihood.

Bieber no more: First Story Detection using Twitter and Wikipedia

M. Osborne, S. Petrovic, R. McCreadie, C. Macdonald, I. Ounis

Abstract: Twitter is a well known source of information regarding breaking news stories. This aspect of Twitter makes it ideal for identifying events as they happen. However, a key problem with Twitter-driven event detection approaches is that they produce many spurious events, i.e., events that are wrongly detected or simply are of no interest to anyone. In this paper, we examine whether Wikipedia (when viewed as a stream of page views) can be used to improve the quality of discovered events in Twitter. Our results suggest that Wikipedia is a powerful filtering mechanism, allowing for easy blocking of large numbers of spurious events. Our results also indicate that events within Wikipedia tend to lag behind Twitter.

Hashtags as Milestones in Time

S. Whiting, O. Alonso

Abstract: On Twitter, hashtags are commonly used by authors wishing to explicitly mention the relevant topic(s) contained in their message, especially when the text quantity is limited. Hashtags are widely used by both tweet authors and users searching for specific information. As such, frequently used hashtags reflect mainstream events in real-time, or ongoing memes. In this paper we propose an approach to identify significant event-based hashtags and use them as annotations for constructing timelines. To do this, we correlate the periods of hashtag use in the Bing Social search engine query-log with the periods of high page viewing popularity of the Wikipedia article(s) most related to the words contained in the hashtag. Preliminary results suggest that the technique is effective for hashtags arising from large-scale events, including hurricanes, celebrities, music releases, TV shows and sports games.


Temporally-Aware Signals for Social Search

A. Khodaei, O. Alonso

Abstract: We propose the inclusion of temporal characteristics into social features when studying social search. The incredible amount of content that is generated by users in social networks, offers tremendous opportunity to examine how users produce and consume such content overtime. By examining users' activities over time, there is potential to learn more about when those burst of interaction occur.This position paper makes the case for incorporating time as another aspect when investigating social search and how they can be used for improving current search scenarios or providing new ones.


Activity Prediction: A Twitter-based Exploration

W. Weerkamp, M. de Rijke

Abstract: Social media platforms allow users to share their messages with everyone else. In microblogs, e.g., Twitter, people mostly report on what they did, they talk about current activities, and mention things they plan to do in the near future. In this paper, we propose the task of activity prediction, that is, trying to establish a set of activities that are likely to become popular at a later time. We perform a small-scale initial experiment, in which we try to predict popular activities for the coming evening using Dutch Twitter data. Our experiment shows the feasibility and challenges of the task, with a simple method resulting in human-readable activities. This exploration also identifies several issues (e.g., temporal phrases and activity classification) that need to be addressed in future work.


OpenGeist: Insight in the Stream of Page Views on Wikipedia

M. Peetz, E. Meij, M. de Rijke

Abstract: We present a RESTful interface that captures insights into the zeitgeist of Wikipedia users. The system is an interface for clustering and comparing concepts based on the time series of the number of views of their Wikipedia page. The functionality is motivated by three use cases, ranging from technical novice to expert user and we also provide two real-life example applications.


Sustainable Questions

B. de Goede, A. Schuth, M. de Rijke

Abstract: Community question answering platforms have large repositories of already answered questions. Reusing these answers for new questions is tempting. However, not all stored answers will still be relevant. In this study, we define a new and challenging problem concerning the sustainability of questions and answers, and present metrics aimed at distinguishing between sustainable and unsustain- able questions. We find that an intuitive approach to sustainability of questions is not sufficient, but that simple properties can already distinguish between sustainable questions and others.


Time-Aware Exploratory Search: Exploring Word Meaning through Time

D. Odijk, G. Santucci, M. de Rijke, M. Angelini, G. Granato

Abstract: With more longitudinal archives becoming digitized and publicly available, new uses emerge. Collections that span centuries call for a time-aware exploration approach, a coordinated environment supporting understanding the development of word usage and meaning through time, with the means to leverage this for exploration. We present ongoing work on a coordinated time-aware exploratory search approach and present a case study on a prototype system. With this approach, a user is able to gain a deeper understanding of the relevant parts of the collection.


Identif ying Relevant Temporal Expressions for Real-World Events

N. Kanhabua, S. Romano, A. Stewart

Abstract: Event detection is an interesting task for many real-world applications, for instance, surveillance, scientific discovery, and Topic Detection and Tracking. Numerous works has focused on identifying an event from unstructured text documents and determining of what the event constitutes, e.g., key terms or entities. Although aforementioned work is able to determine the interesting time period of the event, there is a lack in research on identifying top relevant time for a given event. In this paper, we propose an approach to extracting real-world events, namely, disease outbreaks, from unstructured text documents. In addition, we employ a machine learning algorithm to identify the top relevant time for a given event, by proposing three classes of features, namely, sentence-based, document-based and corpus-specific features. Through extensive experiments using real-world data and 3,500 manually judged relevance pairs, we show that our proposed approach is able to identify the relevant time of events with good accuracy.