﻿<?xml version="1.0" encoding="utf-8" standalone="no"?>
<rss version="2.0">
  <channel>
    <title>Microsoft Research Publications</title>
    <link>http://research.microsoft.com/apps/dp/pu/publications.aspx</link>
    <description>Keep current with all the latest Microsoft Research Publications and Technical Reports</description>
    <copyright>© 2009 Microsoft Corporation. All rights reserved.</copyright>
    <language>en-US</language>
    <lastBuildDate>Sun, 08 Nov 2009 08:00:19 GMT</lastBuildDate>
    <ttl>2880</ttl>
    <item>
      <title>Map-Matching for Low-Sampling-Rate GPS Trajectories</title>
      <description>Map-matching is the process of aligning a sequence of observed user positions with the road network on a digital map. It is a fundamental pre-processing step for many applications, such as moving object management, traffic flow analysis, and driving directions. In practice there exists huge amount of low-sampling-rate (e.g., one point every 2-5 minutes) GPS trajectories. Unfortunately, most current map-matching approaches only deal with high-sampling-rate (typically one point every 10-30s) GPS data, and become less effective for low-sampling-rate points as the uncertainty in data increases. In this paper, we propose a novel global map-matching algorithm called ST-Matching for low-sampling-rate GPS trajectories. ST-Matching considers (1) the spatial geometric and topological structures of the road network and (2) the temporal/speed constraints of the trajectories. Based on spatio-temporal analysis, a candidate graph is constructed from which the best matching path sequence is identified. We compare ST-Matching with the incremental algorithm and Average-Fréchet-Distance (AFD) based global map-matching algorithm. The experiments are performed both on synthetic and real dataset. The results show that our ST-matching algorithm significantly outperform incremental algorithm in terms of matching accuracy for low-sampling trajectories. Meanwhile, when compared with AFD-based global algorithm, ST-Matching also improves accuracy as well as running time.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=105051</link>
      <pubDate>Wed, 04 Nov 2009 08:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Model-Based Testing of Web Applications using NModel</title>
      <description>We show how model-based on-the-fly testing can be applied in the context of web applications using the NModel toolkit. The concrete case study is a commercial web-based positioning system called WorkForce Management (WFM) which interacts with a number of other services, such as billing and positioning, through a mobile operator. We describe the application and the testing, and discuss the test results.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=101196</link>
      <pubDate>Mon, 02 Nov 2009 08:00:00 GMT</pubDate>
    </item>
    <item>
      <title>A Machine Learning Approach for Improved BM25 Retrieval</title>
      <description>Despite the widespread use of BM25, there have been few studies examining its effectiveness on a document description over single and multiple field combinations. We determine the effectiveness of BM25 on various document fields. We find that BM25 models relevance on popularity fields such as anchor text and query click information no better than a linear function of the field attributes. We also find query click information to be the single most important field for retrieval. In response, we develop a machine learning approach to BM25-style retrieval that learns, using LambdaRank, from the input attributes of BM25. Our model significantly improves retrieval effectiveness over BM25 and BM25F. Our data-driven approach is fast, effective, avoids the problem of parameter tuning, and can directly optimize for several common information retrieval measures. We demonstrate the advantages of our model on a very large real-world Web data collection.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=102751</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Avatar Movement in World of Warcraft Battlegrounds</title>
      <description>Evaluating DVE topology management and message propagation schemes requires avatar movement models. Most models are based on reasoned assumptions rather than measured data, potentially biasing evaluation. We measured player movement in World of Warcraft battlegrounds, and compared our observations against common assumptions about player avatar movement and navigation. We found that when modeling a highly interactive DVE such as a battleground, a waypoint model is not sufficient to describe most avatar movement. We were surprised to find that despite game incentives for grouping, the majority of avatar movement between objectives is individual, not grouped. Finally, we found that a hotspot-based model for avatar movement is consistent with our traces.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=103338</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Can Access Control be Extended to Deal with Data Handling in Privacy Scenarios?</title>
      <description />
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=105065</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Characterizing Podcast Services: Publishing, Usage, and Dissemination</title>
      <description>In this paper, we aim at characterizing podcast services both from publishers' and users' perspectives, and at analyzing the implications of these characteristics on the design of efficient dissemination systems. Specifically, our goal is to characterize how podcasting content is generated and published, and how users subscribe and consume podcasts. We are also interested in understanding whether podcast episodes are efficiently disseminated to users just using a sporadic direct access to the Internet (which is the current way of downloading podcast episodes), or whether the use of peer-to-peer mobile device-to-device dissemination systems could help enhancing the performance of podcast services. Our study is based on traces of podcast episode releases, subscriptions, and play times from major podcast service providers. An extensive analysis of the traces allows us to develop a comprehensive model of current podcast services, and provides statistics about the type and content of the typical podcasts, the size and the release frequencies of their episodes, as well as their popularity. By studying podcast usage, we show that the service is delay-tolerant, as users may well play podcast episodes a long time after their actual release. An interesting consequence of this delay tolerance is that mobile device-to-device dissemination systems would not be very useful for the current typical podcasts, while they may become more attractive for future interactive podcast services.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=101674</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Clustering Queries for Better Document Ranking</title>
      <description>Different queries require different ranking methods. It is however challenging to determine what queries are similar, and how to rank documents for them. In this paper, we propose a new method to cluster queries according to the similarity determined based on URLs in their answers. We then train specific ranking models for each query cluster. In addition, a cluster-specific measure of authority is defined to favor documents from authoritative websites on the corresponding topics. The proposed approach is tested using data from a search engine. It turns out that our proposed topic-dependent models can significantly improve the search results of eight most popular categories of queries.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=103236</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Context-Aware Online Commercial Intention Detection</title>
      <description />
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=102413</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Exploiting Term Relationship To Boost Text Classification</title>
      <description />
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=102411</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Improving Privacy and Security in Multi-Authority Attribute-Based Encryption</title>
      <description />
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=102476</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Mapping Kernel Objects to Enable Systematic Integrity Checking</title>
      <description />
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=101328</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>On Meeting Lifetime Goals and Providing Constant Application Quality</title>
      <description>Most work in sensor networks tries to maximize network lifetime. However, for many applications the required lifetime is known in advance. Therefore, application quality should rather be maximized for that given time. Levels, the approach presented in this article, is a programming abstraction for energy-aware sensor network applications that helps to meet such a user-defined lifetime goal by deactivating optional functionality. With this programming abstraction, the application developer defines so-called energy levels. Functionality in energy levels is deactivated if the required lifetime cannot be met otherwise. The runtime system uses data about the energy consumption of different levels to compute an optimal level assignment that maximizes each node’s quality for the time remaining. As described in this paper, Levels includes a completely distributed coordination algorithm that balances energy level assignments and keeps the application quality of the network roughly constant over time. In this approach, each node computes its schedule based on those of its neighbors. As the evaluation shows, applications using Levels can accurately meet given lifetime goals with only small fluctuations in application quality. In addition, the runtime overhead both for computation and for communication is negligible.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=112189</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>On the Effectiveness of Unit Test Automation at Microsoft</title>
      <description />
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=102349</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Patient Controlled Encryption: patient privacy in electronic medical records</title>
      <description />
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=102475</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Post-Rank Reordering: Resolving Preference Misalignments between Search Engines and End Users</title>
      <description>No search engine is perfect. A typical type of imperfection is the preference misalignment between search engines and end users, e.g., from time to time, web users skip higherranked documents and click on lower-ranked ones. Although search engines have been aggressively incorporating clickthrough data in their ranking, it is hard to eliminate such misalignments across millions of queries. Therefore, we, in this paper, propose to accompany a search engine with an “always-on” component that reorders documents on a perquery basis, based on user click patterns. Because of positional bias and dependencies between clicks, we show that a simple sort based on click counts (and its variants), albeit intuitive and useful, is not precise enough. In this paper, we put forward a principled approach to reordering documents by leveraging existing click models. Specifically, we compute the preference probability that a lower-ranked document is preferred to a higher-ranked one from the Click Chain Model (CCM), and propose to swap the two documents if the probability is sufficiently high. Because CCM models positional bias and dependencies between clicks, this method readily accounts for many twisted heuristics that have to be manually encoded in sort-based approaches. For this approach to be practical, we further devise two approximation schemes that make online computation of the preference probability feasible. We carried out a set of experiments based on real-world data from a major search engine, and the result clearly demonstrates the effectiveness of the proposed approach.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=101427</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Semi-Supervised Learning of Semantic Classes for Query Understanding – from the Web and for the Web</title>
      <description>Understanding intents from search queries can improve a user's search experience and boost a site's advertising profits. Query tagging via statistical sequential labeling models has been shown to perform well, but annotating the training set for supervised learning requires substantial human effort. Domain-specific knowledge, such as semantic class lexicons, reduces the amount of needed manual annotations, but much human effort is still required to maintain these as search topics evolve over time. This paper investigates semi-supervised learning algorithms that leverage structured data (HTML lists) from the Web to automatically generate semantic-class lexicons, which are used to improve query tagging performance -- even with far less training data. We focus our study on understanding the correct objectives for the semi-supervised lexicon learning algorithms that are crucial for the success of query tagging. Prior work on lexicon acquisition has largely focused on the precision of the lexicons, but we show that precision is not important if the lexicons are used for query tagging. A more adequate criterion should emphasize a trade-off between maximizing the recall of semantic class instances in the data, and minimizing the confusability. This ensures that the similar levels of precision and recall are observed on both training and test set, hence prevents over-fitting the lexicon features in a sequential labeling model. Experimental results on retail product queries from a commercial search engine show that enhancing a query tagger with lexicons learned based on this objective reduces word level tagging errors by up to 25% compared to the baseline tagger that does not use any lexicon features. In contrast, lexicons obtained through a precision-centric learning algorithm even degrade the performance of a tagger compared to the baseline. Furthermore, the proposed method outperforms one in which semantic class lexicons have been extracted from a structured database.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=101154</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Socializing or Knowledge Sharing? Characterizing Social Intent in Community Question Answering</title>
      <description />
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=101140</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>The Case for VM-based Cloudlets in Mobile Computing</title>
      <description>Resource poverty is a fundamental constraint that severely limits the class of applications that can be run on mobile devices. This constraint is not just a temporary limitation of current technology, but is intrinsic to mobility. In this paper, we put forth a vision of mobile computing that breaks free of this fundamental constraint. In this vision, mobile users seamlessly utilize nearby computers to obtain the resource benefits of cloud computing without incurring WAN delays and jitter. Rather than relying on a distant “cloud,” a mobile user instantiates a “cloudlet” on nearby infrastructure and uses it via a wireless LAN. Crisp interactive response for immersive applications that augment human cognition is then much easier to achieve because of the proximity of the cloudlet. We confirm that a critical untested aspect of this vision, namely rapid customization of cloudlet infrastructure, is achievable through dynamic VM synthesis. While much remains to be done, the concepts and ideas introduced here open the door to a new world of mobile computing in which seamless cognitive assistance of users occurs in diverse ways at any time and place.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=102364</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Using Socio-Technical Networks to Predict Failures</title>
      <description />
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=102348</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title> “i-Internet? Intle” (beautiful): Exploring first time internet use via mobile phones in a South African women’s collective</title>
      <description>This study reports results of an ethnographic action research study, exploring mobile-centric internet use. Over the course of 13 weeks, eight women, each a member of a livelihoods collective in urban Cape Town, South Africa, received training to make use of the data (internet) features on the phones they already owned. None of the women had previous exposure to PCs or the internet. Activities focused on social networking, entertainment, information search, and, in particular, job searches. Results of the exercise reveal both the promise of, and barriers to, mobile internet use by a potentially large community of first-time, mobile-centric users. Discussion focuses on the importance of self-expression and identity management in the refinement of online and offline presences, and considers these forces relative to issues of gender and socioeconomic status.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=112390</link>
      <pubDate>Fri, 30 Oct 2009 07:00:00 GMT</pubDate>
    </item>
  </channel>
</rss>