Our research
##### Content type
+
+
Events (451)

Groups (152)
+
News (2758)

People (734)

Projects (1111)
+
Publications (12626)
+
Videos (5807)
##### Research areas
Algorithms and theory47205 (342)
Communication and collaboration47188 (215)
Computational linguistics47189 (249)
Computational sciences47190 (224)
Computer systems and networking47191 (767)
Computer vision208594 (911)
Data mining and data management208595 (120)
Economics and computation47192 (105)
Education47193 (86)
Gaming47194 (79)
Graphics and multimedia47195 (235)
Hardware and devices47196 (216)
Health and well-being47197 (92)
Human-computer interaction47198 (898)
Machine learning and intelligence47200 (907)
Mobile computing208596 (63)
Quantum computing208597 (34)
Search, information retrieval, and knowledge management47199 (699)
Security and privacy47202 (317)
Social media208598 (53)
Social sciences47203 (267)
Software development, programming principles, tools, and languages47204 (625)
Speech recognition, synthesis, and dialog systems208599 (138)
Technology for emerging markets208600 (32)
1–25 of 767
Sort
Show 25 | 50 | 100
1234567Next
##### Publication details
Date: 1 October 2015
Type: Article
Publisher: ASME

In data centers, caches work both to provide low IO latencies and to reduce the load on the back-end network and storage. But they are not designed for multi-tenancy; system-level caches today cannot be configured to match tenant or provider objectives. Exacerbating the problemis the increasing number of un-coordinated caches on the IO data plane. The lack of global visibility on the control plane to coordinate this distributed set of caches leads to inefficiencies, increasing cloud provider cost....

##### Publication details
Date: 27 August 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery

Modern datacenter applications demand high throughput (40Gbps) and ultra-low latency (< 10 microsecond per hop) from the network, with low CPU overhead. Standard TCP/IP stacks cannot meet these requirements, but Remote Direct Memory Access (RDMA) can. On IP-routed datacenter networks, RDMA is deployed using RoCEv2 protocol, which relies on Priority-based Flow Control (PFC) to enable a drop-free network. However, PFC can lead to poor application performance due to problems like head-of-line...

##### Publication details
Date: 17 August 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery

We introduce cost allocation in the cloud, a problem of accounting costs produced by virtual machine workload in the data center to each player in the cloud. Player here can be the size of virtual machine type or the count of virtual machine in the user decided virtual machine packets. We draw this problem by analyzing the cost of real workload in Microsoft Azure. To our best knowledge, the cost allocation in the cloud has never been studied before. Based on the fact that the players in the cloud form...

##### Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: SoCC 2015 (Poster)

In a modern web application, a single high-level action like a mouse click triggers a flurry of asynchronous events on the client browser and remote web servers. We introduce Domino, a new tool which automatically captures and analyzes end-to-end, asynchronous causal relationship of events that span clients and servers. Using Domino, we found uncharacteristically long event chains in Bing Maps, discovered data races in the WinJS implementation of promises, and developed a new server-side scheduling...

##### Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: ACM Symposium on Cloud Computing conference (SOCC)
##### Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: KDD

To reduce the impact of network congestion on big data jobs, cluster management frameworks use various heuristics to schedule compute tasks and/or network flows. Most of these schedulers consider the job input data fixed and greedily schedule the tasks and flows that are ready to run. However, a large fraction of production jobs are recurring with predictable characteristics, which allows us to plan ahead for them. Coordinating the placement of data and tasks of these jobs allows for significantly...

##### Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: ACM SIGCOMM

GRAM is an efficient and scalable graph engine for a large class of widely used graph algorithms. It is designed to scale up to multicores on a single server, as well as scale out to multiple servers in a cluster, offering significant, often over an order-of-magnitude, improvement over existing distributed graph engines on evaluated graph algorithms. GRAM is also capable of processing graphs that are significantly larger than previously reported. In particular, using 64 servers (1,024 physical cores),...

##### Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
##### Publication details
Date: 1 August 2015
Type: Proceedings
Publisher: HOTCHIPS

Modern data analytical tasks often witness very wide tables, from a few hundred columns to a few thousands. While it is commonly agreed that column stores are an appropriate data format for wide tables, the order of columns has long been neglected. Column ordering plays an important role in I/O performance, because these tables are so wide that accessing columns in a single horizontal partition may involve multiple disk seeks. In this paper, we study the problem of column ordering for column stores on...

##### Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: SoCC 2015 (Poster)

Many cloud applications can benefit from guaranteed latency for their network messages, however providing such predictability is hard, especially in multi-tenant datacenters. We identify three key requirements for such predictability: guaranteed network bandwidth, guaranteed packet delay and guaranteed burst allowance. We present Silo, a system that offers these guarantees in multi-tenant datacenters. Silo leverages the tight coupling between bandwidth and delay: controlling tenant bandwidth leads to...

##### Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery

Many network functions executed in modern datacenters, e.g., load balancing, application-level QoS, and congestion control, exhibit three common properties at the data plane: they need to access and modify state, to perform computations, and to access application semantics --- this is critical since many network functions are best expressed in terms of application-level messages. In this paper, we argue that the end hosts are a natural enforcement point for these functions and we present Eden, an...

##### Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery

Rack-scale computers, comprising a large number of micro-servers connected by a direct-connect topology, are poised to replace servers as the building block in data centers. We focus on the problem of routing and congestion control across the rack's network, and find that high path diversity in rack topologies, in combination with workload diversity across it, means that traditional solutions are inadequate.

We present R2C2, a network stack for rack-scale computers providing flexible and...

##### Publication details
Date: 1 August 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery

Size and weight constraints on wearables limit their battery capacity and restrict them from providing rich functionality. The need for durable and secure storage for personal data further compounds this problem as these features incur energy-intensive operations. This paper
presents WearDrive, a fast storage system for wearables based on battery-backed RAM and an efficient means to offload energy intensive tasks to the phone. WearDrive leverages low-power network connectivity available on...

##### Publication details
Date: 10 July 2015
Type: Inproceeding
Publisher: USENIX
Awards: Best Paper Award

To cope with the ever growing availability of training data, there have been several proposals to scale machine learning computation beyond a single server and distribute it across a cluster. While this enables reducing the training time, the observed speed up is often limited by network bottlenecks.

To address this, we design MLNet, a host-based communication layer that aims to improve the network performance of distributed machine learning systems. This is achieved through a combination of...

##### Publication details
Date: 1 July 2015
Type: Inproceeding
Publisher: USENIX – Advanced Computing Systems Association

The problem of electing a leader from among n contenders is one of the fundamental questions in distributed computing. In its simplest formulation, the task is as follows: given n processors, all participants must eventually return a win or lose indication, such that a single contender may win. Despite a considerable amount of work on leader election, the following question is still open: can we elect a leader in an asynchronous fault-prone system faster than...

##### Publication details
Date: 1 July 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery

Temporal graphs that capture graph changes over time are attracting increasing interest from research communities, for functions such as understanding temporal characteristics of social interactions on a time-evolving social graph. ImmortalGraph is a storage and execution engine designed and optimized specifically for temporal graphs. Locality is at the center of ImmortalGraph’s design: temporal graphs are carefully laid out in both persistent storage and memory, taking into account data locality in...

##### Publication details
Date: 1 July 2015
Type: Article
Publisher: ACM – Association for Computing Machinery

Population protocols, roughly defined as systems consisting of large numbers of simple identical agents, interacting at random and updating their state following simple rules, are an important research topic at the intersection of distributed computing and biology. One of the fundamental tasks that a population protocol may solve is majority: each node starts in one of two states; the goal is for all nodes to reach a correct consensus on which of the two states was initially the majority....

##### Publication details
Date: 1 July 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery

Datacenter-scale computing for analytics workloads is increasingly common. High operational costs force heterogeneous applications to share cluster resources for achieving economy of scale. Scheduling such large and diverse workloads is inherently hard, and existing approaches tackle this in two alternative ways: 1) centralized solutions offer strict, secure enforcement of scheduling invariants (e.g., fairness, capacity) for heterogeneous applications, 2) distributed solutions offer...

##### Publication details
Date: 1 July 2015
Type: Inproceeding
Publisher: USENIX – Advanced Computing Systems Association

Current data storage on smartphones mostly inherits from desktop/server systems a flash-centric design: The memory (DRAM) effectively acts as an I/O cache for the relatively slow flash. To improve both app responsiveness and energy efficiency, this paper proposes MobiFS, a memory-centric design for smartphone data storage. This design no longer exercises cache writeback at short fixed periods or on file synchronization calls. Instead, it incrementally checkpoints app data into flash at appropriate...

##### Publication details
Date: 1 July 2015
Type: Inproceeding
Publisher: USENIX – Advanced Computing Systems Association

In this work, we consider the following random process, motivated by the analysis of lock-free concurrent algorithms under high memory contention. In each round, a new scheduling step is allocated to one of $n$ threads, according to a distribution $\vect{p} = (p_1, p_2, \ldots, p_n)$, where thread $i$ is scheduled with probability $p_i$. When some thread first reaches a set threshold of executed steps, it registers a \emph{win}, completing its current operation, and resets its step count to $1$. At...

##### Publication details
Date: 1 July 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery

Applications can map data on SSDs into virtual memory to transparently scale beyond DRAM capacity, permitting them to leverage high SSD capacities with few code changes. Obtaining good performance for memory-mapped SSD content, however, is hard because the virtual memory layer, the file system and the flash translation layer (FTL) perform address translations, sanity and permission checks independently from each other. We introduce FlashMap, an SSD interface that is
optimized for memory-mapped...

##### Publication details
Date: 13 June 2015
Type: Inproceeding
Publisher: ACM/IEEE

Gaming on phones, tablets and laptops is very popular. Cloud gaming -- where remote servers perform game execution and rendering on behalf of thin clients that simply send input and display output frames -- promises any device the ability to play any game any time. Unfortunately, the reality is that wide-area network latencies are often prohibitive; cellular, Wi-Fi and even wired residential end host round trip times (RTTs) can exceed 100ms, a threshold above which many gamers tend to deem...

##### Publication details
Date: 3 June 2015
Type: Proceedings
Publisher: ACM – Association for Computing Machinery

Developers and architects spend a lot of time trying to understand and eliminate performance problems. Unfortunately, the root causes of many problems occur at a fine granularity that existing continuous profiling and direct measurement approaches cannot observe. This paper presents the design and implementation of SHIM, a continuous profiler that samples at resolutions as fine as 15 cycles; three to five orders of magnitude finer than current continuous profilers. SHIM’s...

##### Publication details
Date: 1 June 2015
Type: Inproceeding
Publisher: ACM/IEEE International Symposium on Computer Architecture (ISCA)

Modern software applications and services operate nowadays on top of large clusters and datacenters. To reduce the underlying infrastructure cost and increase utilization, different services share the same physical resources (e.g., CPU, bandwidth, I/O, memory). Consequently, the cluster provider often has to decide in real-time how to allocate resources in overbooked systems, taking into account the different characteristics and requirements of users. In this paper, we consider an important problem...

##### Publication details
Date: 1 June 2015
Type: Inproceeding
Publisher: ACM – Association for Computing Machinery
1–25 of 767
Sort
Show 25 | 50 | 100
1234567Next
> Our research