﻿<?xml version="1.0" encoding="utf-8" standalone="no"?>
<rss version="2.0">
  <channel>
    <title>Microsoft Research Publications</title>
    <link>http://research.microsoft.com/apps/dp/pu/publications.aspx</link>
    <description>Keep current with all the latest Microsoft Research Publications and Technical Reports</description>
    <copyright>© 2009 Microsoft Corporation. All rights reserved.</copyright>
    <language>en-US</language>
    <lastBuildDate>Mon, 23 Nov 2009 08:00:20 GMT</lastBuildDate>
    <ttl>2880</ttl>
    <item>
      <title>Action Prediction and Identification From Mining Temporal User Behavior</title>
      <description>Predicting user's action provides many monetization opportunities to web service providers. If a user's future action can be predicted and identified correctly in time or in advance, we can not only satisfy user's current need, but also facilitate and simplify user's future online activities. Traditional works on user behavior modeling such as implicit feedback or personalization mainly investigate on users' immediate, short-term or aggregate behaviors. As such, it is difficult to understand the diversity in user behavior and predict user's future action. In this paper, we consider a forecasting problem of temporal user behavior modeling. Our first objective is able to predict whether a user will perform an action. The second objective is able to identify whether a user has finished the action, even when the action happened offline. We propose an ensemble algorithm to achieve both objectives. The experiment compares several implementation methods and illustrates how to build the temporal behavior model to capture relevant users with a high precision.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=115415</link>
      <pubDate>Fri, 20 Nov 2009 08:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Anonymity-Preserving Data Aggregation using Anonygator</title>
      <description>Data aggregation is a key aspect of many distributed applications, such as distributed sensing, performance monitoring, and distributed diagnostics. In such settings, user anonymity is a key concern of the participants. In the absence of an assurance of anonymity, users may be reluctant to contribute data such as their location or configuration settings on their computer. In this paper, we present the design, analysis, implementation, and evaluation of Anonygator, an anonymity preserving data aggregation service for large-scale distributed applications. Anonygator uses anonymous routing to provide user anonymity by disassociating messages from the hosts that generated them. It prevents malicious users from uploading disproportionate amounts of spurious data by using a lightweight accounting scheme. Finally, Anonygator maintains overall system scalability by employing a novel distributed tree-based data aggregation procedure that is robust to pollution attacks. All of these components are tuned by a customization tool, with a view to achieve specific anonymity, pollution resistance, and efficiency goals. To demonstrate the usefulness of Anonygator, we have used it to prototype three applications, one of which we have evaluated on PlanetLab. The other two have been evaluated on a local testbed.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=115420</link>
      <pubDate>Fri, 20 Nov 2009 08:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Enforcing Stateful Authorization and Information Flow Policies in Fine</title>
      <description>Proving software free of security bugs is hard. Programming language support to ensure that programs correctly enforce their security policies would help, but, to date, no language has the ability to verify the enforcement of the kinds of policies used in practice---dynamic, stateful policies which address a broad range of concerns including forms of access control and information flow tracking. This paper presents Fine, a new source-level security-typed language that, through the use of a simple module system and dependent, refinement, and affine types, checks the enforcement of dynamic security policies applied to real software. Fine is proven sound. A prototype implementation of the compiler and several example programs are available.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=115445</link>
      <pubDate>Fri, 20 Nov 2009 08:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Grouped Pay-Per-Click: a Novel Mechanism Bridging Pay-Per-Click and Pay-Per-Action</title>
      <description>There are three dominant business models in online advertising industry such as pay-per-impression (PPI), pay-per-click (PPC) and pay-per-action (PPA). With the growth of sponsored search, there has been a move from the PPI model toward the PPC model as it significantly reduces the advertiser's risk of being charged from delivering advertisements to irrelevant users. While the PPC model cannot satisfy advertisers due to click fraud, the PPA model often brings search engine into an unfair situation because it promotes advertiser awareness to a group of audience, however, very less action does happen. In this paper, we propose a novel mechanism, called grouped pay-per-click (Grouped PPC), which takes advantages of both PPC and PPA. This mechanism provides an opportunity to advertisers to partition the clicked users into several groups and submit separate bids on different user groups. Search engine maintains a tailored classifier for each advertiser to differentiate incoming users beyond keyword. Thus, the auction is executed on both keyword and user group levels. We give an empirical analysis on the user post-ad-click behavior from the real behavior log and validate that both the advertiser and search engine have incentive to adopt into the grouped PPC model. We derive the truthfulness payment and prove grouped PPC is a more efficient mechanism than PPC. Furthermore, we introduce how to use machine learning techniques to build the classifier to classify users according to advertiser's supervision. One advantage of grouped PPC is that it can be smoothly integrated into the PPC model so that advertisers from PPC have the freedom to decide whether to refine cost-per-click according to user groups.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=115416</link>
      <pubDate>Fri, 20 Nov 2009 08:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Roles, Stacks, Histories: A Triple for Hoare</title>
      <description>Behavioural type and effect systems regulate properties such as adherence to object and communication protocols, dynamic security policies, avoidance of race conditions, and many others. Typically, each system is based on some specific syntax of constraints, and is checked with an ad hoc solver. Instead, we advocate types refined with first-order logic formulas as a basis for behavioural type systems, and general purpose automated theorem provers as an effective means of checking programs. To illustrate this approach, we define a triple of security-related type systems: for role-based access control, for stack inspection, and for history-based access control. The three are all instances of a refined state monad. Our semantics allows a precise comparison of the similarities and differences of these mechanisms. In our examples, the benefit of behavioural type-checking is to rule out the possibility of unexpected security exceptions, a common problem with code-based access control.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=101538</link>
      <pubDate>Fri, 20 Nov 2009 08:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Holmes Beta 1.0</title>
      <description />
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=115394</link>
      <pubDate>Thu, 19 Nov 2009 08:00:00 GMT</pubDate>
    </item>
    <item>
      <title>ConScript: Specifying and Enforcing Fine-Grained Security Policies for JavaScript in the Browser</title>
      <description>Much of the power of modern Web comes from the ability of a Web page to combine contents and JavaScript code from disparate servers on the same page. While the ability to create such mash-ups is attractive for both the user and the developer because of extra functionality, because of code inclusion, the hosting site effectively opens itself up for attacks and poor programming practices within every JavaScript library or API it chooses to use. In other words, expressiveness comes at the price of losing control. To regain the control, it is therefore valuable to provide means for the hosting page to restrict the behavior of the code that it may include. This paper presents ConScript, an client-side advice implementation for security, built on top of Internet Explorer 8. ConScript allows the hosting page to express fine-grained application-specific security policies that are enforced at runtime. In addition to presenting 17 widely-ranging security and reliability policies that ConScript enables, we also show how policies can be generated automatically through static analysis of server-side code or runtime analysis of client-side code. We also present a type system that helps ensure correctness of ConScript policies. To show the practicality of ConScript in a range of settings, we compare the overhead of ConScript enforcement and conclude that it is significantly lower than that of other systems proposed in the literature, both on micro-benchmarks as well as large, widely-used applications such as MSN, GMail, Google Maps, and Live Desktop.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=115390</link>
      <pubDate>Wed, 18 Nov 2009 08:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Lost in Translation: Forgetful Semantic Anchoring</title>
      <description />
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=103238</link>
      <pubDate>Mon, 16 Nov 2009 08:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Finding heap-bounds for hardware synthesis</title>
      <description>Dynamically allocated and manipulated data structures cannot be translated into hardware unless there is an upper bound on the amount of memory the program uses during all executions. This bound can depend on the generic parameters to the program, i.e., program inputs that are instantiated at synthesis time. We propose a constraint based method for the discovery of memory usage bounds, which leads to the first known C-to-gates hardware synthesis supporting programs with non-trivial use of dynamically allocated memory, e.g., linked lists maintained with malloc and free. We illustrate the practicality of our tool on a range of examples.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=81080</link>
      <pubDate>Sun, 15 Nov 2009 08:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Notifications and Awareness: A Field Study of Alert Usage and Preferences</title>
      <description>Desktop notifications are designed to provide awareness of information while a user is attending to a primary task. Unfortunately the awareness can come with the price of disruption to the task at focus of attention. We review results of a field study on the use and perceived value of email notifications in the workplace. We recorded users’ interactions with software applications for two weeks and studied how notifications or their forced absence influenced users’ quest for awareness of new email arrival, as well as the impact of notifications on their overall task focus. Results showed that users view notifications as a mechanism to provide passive awareness rather than a trigger to switch tasks. Turing off notifications cause some users to self interrupt more to explicitly monitor email arrival, while others appear to be able to better focus on their tasks. Users acknowledge notifications as disruptive, yet opt for them because of their perceived value in providing awareness.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=81068</link>
      <pubDate>Sat, 14 Nov 2009 08:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Ripley: Automatically Securing Web 2.0 Applications Through Replicated Execution</title>
      <description>Rich Internet applications are becoming increasingly distributed, as demonstrated by the popularity of AJAX or Web 2.0 applications such as Facebook, Google Maps, Hotmail and many others. A typical multi-tier AJAX application consists of a server component implemented in Java J2EE, PHP or ASP.NET and a client-side component executing in JavaScript. The resulting application is more responsive because computation is moved closer to the client, avoiding unnecessary network round trips for frequent user actions. However, once a portion of the code is moved to the client, a malicious user can subvert the client side of the computation, jeopardizing the integrity of the server-side state. In this paper we propose Ripley, a system that uses replicated execution to automatically preserve the integrity of a distributed computation. Ripley replicates a copy of the client-side computation on the trusted server tier. Every client-side event is transferred to the replica of the client for execution. Ripley observes results of the computation, both as computed on the client-side and on the server side using the replica of the client-side code. Any discrepancy is flagged as a potential violation of computational integrity. We built Ripley on top of Volta, a distributing compiler that translates .NET applications into JavaScript, effectively providing a measure of security by construction for Volta applications. We have evaluated the Ripley approach on five representative AJAX applications built in Volta and also Hotmail, a large widely-used AJAX application. Our results so far suggest that Ripley provides a promising strategy for building secure distributed Web applications, which places minimal burden on the application developer at the cost of a low performance overhead.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=101931</link>
      <pubDate>Tue, 10 Nov 2009 08:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Map-Matching for Low-Sampling-Rate GPS Trajectories</title>
      <description>Map-matching is the process of aligning a sequence of observed user positions with the road network on a digital map. It is a fundamental pre-processing step for many applications, such as moving object management, traffic flow analysis, and driving directions. In practice there exists huge amount of low-sampling-rate (e.g., one point every 2-5 minutes) GPS trajectories. Unfortunately, most current map-matching approaches only deal with high-sampling-rate (typically one point every 10-30s) GPS data, and become less effective for low-sampling-rate points as the uncertainty in data increases. In this paper, we propose a novel global map-matching algorithm called ST-Matching for low-sampling-rate GPS trajectories. ST-Matching considers (1) the spatial geometric and topological structures of the road network and (2) the temporal/speed constraints of the trajectories. Based on spatio-temporal analysis, a candidate graph is constructed from which the best matching path sequence is identified. We compare ST-Matching with the incremental algorithm and Average-Fréchet-Distance (AFD) based global map-matching algorithm. The experiments are performed both on synthetic and real dataset. The results show that our ST-matching algorithm significantly outperform incremental algorithm in terms of matching accuracy for low-sampling trajectories. Meanwhile, when compared with AFD-based global algorithm, ST-Matching also improves accuracy as well as running time.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=105051</link>
      <pubDate>Wed, 04 Nov 2009 08:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Model-Based Testing of Web Applications using NModel</title>
      <description>We show how model-based on-the-fly testing can be applied in the context of web applications using the NModel toolkit. The concrete case study is a commercial web-based positioning system called WorkForce Management (WFM) which interacts with a number of other services, such as billing and positioning, through a mobile operator. We describe the application and the testing, and discuss the test results.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=101196</link>
      <pubDate>Mon, 02 Nov 2009 08:00:00 GMT</pubDate>
    </item>
    <item>
      <title>A Machine Learning Approach for Improved BM25 Retrieval</title>
      <description>Despite the widespread use of BM25, there have been few studies examining its effectiveness on a document description over single and multiple field combinations. We determine the effectiveness of BM25 on various document fields. We find that BM25 models relevance on popularity fields such as anchor text and query click information no better than a linear function of the field attributes. We also find query click information to be the single most important field for retrieval. In response, we develop a machine learning approach to BM25-style retrieval that learns, using LambdaRank, from the input attributes of BM25. Our model significantly improves retrieval effectiveness over BM25 and BM25F. Our data-driven approach is fast, effective, avoids the problem of parameter tuning, and can directly optimize for several common information retrieval measures. We demonstrate the advantages of our model on a very large real-world Web data collection.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=102751</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Avatar Movement in World of Warcraft Battlegrounds</title>
      <description>Evaluating DVE topology management and message propagation schemes requires avatar movement models. Most models are based on reasoned assumptions rather than measured data, potentially biasing evaluation. We measured player movement in World of Warcraft battlegrounds, and compared our observations against common assumptions about player avatar movement and navigation. We found that when modeling a highly interactive DVE such as a battleground, a waypoint model is not sufficient to describe most avatar movement. We were surprised to find that despite game incentives for grouping, the majority of avatar movement between objectives is individual, not grouped. Finally, we found that a hotspot-based model for avatar movement is consistent with our traces.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=103338</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Can Access Control be Extended to Deal with Data Handling in Privacy Scenarios?</title>
      <description />
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=105065</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Characterizing Podcast Services: Publishing, Usage, and Dissemination</title>
      <description>In this paper, we aim at characterizing podcast services both from publishers' and users' perspectives, and at analyzing the implications of these characteristics on the design of efficient dissemination systems. Specifically, our goal is to characterize how podcasting content is generated and published, and how users subscribe and consume podcasts. We are also interested in understanding whether podcast episodes are efficiently disseminated to users just using a sporadic direct access to the Internet (which is the current way of downloading podcast episodes), or whether the use of peer-to-peer mobile device-to-device dissemination systems could help enhancing the performance of podcast services. Our study is based on traces of podcast episode releases, subscriptions, and play times from major podcast service providers. An extensive analysis of the traces allows us to develop a comprehensive model of current podcast services, and provides statistics about the type and content of the typical podcasts, the size and the release frequencies of their episodes, as well as their popularity. By studying podcast usage, we show that the service is delay-tolerant, as users may well play podcast episodes a long time after their actual release. An interesting consequence of this delay tolerance is that mobile device-to-device dissemination systems would not be very useful for the current typical podcasts, while they may become more attractive for future interactive podcast services.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=101674</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Clustering Queries for Better Document Ranking</title>
      <description>Different queries require different ranking methods. It is however challenging to determine what queries are similar, and how to rank documents for them. In this paper, we propose a new method to cluster queries according to the similarity determined based on URLs in their answers. We then train specific ranking models for each query cluster. In addition, a cluster-specific measure of authority is defined to favor documents from authoritative websites on the corresponding topics. The proposed approach is tested using data from a search engine. It turns out that our proposed topic-dependent models can significantly improve the search results of eight most popular categories of queries.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=103236</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Context-Aware Online Commercial Intention Detection</title>
      <description />
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=102413</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Evaluating Recommender Systems</title>
      <description>Recommender systems are now popular both commercially and in the research community, where many approaches have been suggested for providing recommendations. In many cases a system designer that wishes to employ a recommendation system must choose between a set of candidate approaches. A first step towards selecting an appropriate algorithm is to decide which properties of the application to focus upon when making this choice. Indeed, recommendation systems have a variety of properties that may affect user experience, such as accuracy, robustness, scalability, and so forth. In this paper we discuss how to compare recommenders based on a set of properties that are relevant for e application. We focus on comparative studies, where a few algorithms are compared using some evaluation metric, rather than absolute benchmarking of algorithms. We describe experimental settings appropriate for making choices between algorithms. We review three types of experiments, starting with an offline setting, where recommendation approaches are compared without user interaction, then reviewing user studies, where a small group of subjects experiment with the system and report on the experience, and finally describe large scale online experiments, where real user populations interact with the system. In each of these cases we describe types of questions that can be answered, and suggest protocols for experimentation. We also discuss how to draw trustworthy conclusions from e conducted experiments. We then review a large set of properties, and explain how to evaluate systems given relevant properties. We also survey a large set of evaluation metrics in the context of the property that they evaluate.</description>
      <link>http://research.microsoft.com/apps/pubs/default.aspx?id=115396</link>
      <pubDate>Sun, 01 Nov 2009 07:00:00 GMT</pubDate>
    </item>
  </channel>
</rss>