Spark-CLR is an cross-company open source project to provide C# language bindings for Apache Spark, which is a cluster computing framework built around the core programming abstractions of Resilient Distributed Datasets (RDDs), a logical collection of data partitioned across machines, and Discretized Streams (DStreams), a temporal sequence of RDDs.
Resource poverty in mobile devices is a fundamental constraint and not simply a temporary limitation of current technology. In this talk, I will put forth a vision and propose a technology that breaks free of this constraint. In this vision, mobile users seamlessly use nearby micro datacenters to obtain the resource benefits of cloud computing without incurring wide area network delays and jitter. Crisp interactive response for immersive applications that augment human cognition become easier to
Connecting the Next Billion Users to the Broadband Internet
Seabed is a project to provide analytics over encrypted Big Data. The challenge is to develop fast yet secure cryptographic techniques that support a suite of applications such as Business Intelligence tools and large-scale Machine Learning frameworks. Currently, we are building Seabed into Apache Spark.
The Distributed Social Analytics Platform (DSoAP) project is focused on the “Huge Data” problem in social policy research caused by the breadth of data involved. Using aggregate social media data to investigate and validate social issues such as employment, health and fiscal policy requires analyzing many months or years of data. DSoAP is applying intelligent compaction, pre-indexing and distribution of data across a server cluster to achieve responsive query times for online data exploration.
The amount of digital data produced has long been outpacing the amount of storage available. This project enables molecular-level data storage into DNA molecules by leveraging biotechnology advances in synthesizing, manipulating and sequencing DNA to develop archival storage.
DCQCN is a congestion control protocol for large scale RDMA networks, developed jointly by Microsoft and Mellanox.
MWT is a toolbox of machine learning technology for principled and efficient experimentation, plausibly applicable to most Microsoft services that interact with customers.
File System for Approximate Storage
The PinDrop project focuses on building the substrate for supporting high-quality real-time streaming over wired and wireless networks.
Robust Distributed System Nucleus (rDSN) is an open framework for quickly building and managing high performance and robust distributed systems. The core idea is a coherent and principled design that distributed systems, tools, and frameworks can be developed independently and later on integrated (almost) transparently.
Graph Engine, previously known as Trinity, is a distributed, in-memory, large graph processing engine.
The proliferation of connected devices can in theory enable a range of applications that make rich inferences about users and their environment. But in practice developing such applications today is arduous because they are constructed as monolithic silos, tightly coupled to sensing devices, and must implement all sensing & inference logic, even as devices move or are temporarily disconnected. Our goal is to break down restrictive device-application silos and simplify app development.
The Kamino project explores ways in which systems should adopt new memory technologies including SSDs (NAND-Flash), battery-backed DRAM and emerging non-volatile memory technologies (phase change memory, memristors, spin-torque transfer memory, etc.) for increased performance and efficiency. The project explores how to best leverage such new memory technologies inside systems of all sizes and shapes: from mobile to data center scale.
This is a project looking into design and evaluation of efficient and deployable algorithms for assignment of complex workloads to resources in modern cloud service platforms.
Project Catapult is a Microsoft venture that investigates the use of field-programmable gate arrays (FPGAs) to improve performance, reduce power, and provide new capabilities in the datacenter.
Parasail is a novel approach to parallelizing a large class of seemingly sequential applications wherein dependencies are, at runtime, treated as symbolic values. The efficiency of parallelization, then, depends on the efficiency of the symbolic computation, an active area of research in static analysis, verification, and partial evaluation. This is exciting as advances in these fields can translate to novel parallel algorithms for sequential computation.
An Ironclad App lets a user securely transmit her data to a remote machine with the guarantee that every instruction executed on that machine adheres to a formal abstract specification of the app's behavior. This does more than eliminate implementation vulnerabilities such as buffer overflows, parsing errors, or data leaks; it tells the user exactly how the app will behave at all times.
an overhead-constraint logging system
Data compression is essential to large-scale data centers to save both storage and network bandwidth. Current software based method suffers from high computational cost with limited performance. In this project, we are migrating the fundamental workload of the computer system to FPGA accelerator, aiming high throughput performance and high energy efficiency, as well as freeing some CPU resources.
Software-defined radios (SDR) have a potential to bring major innovation in wireless networking design. However, their impact so far has been limited due to complex programming tools. Ziria addresses this problem. It consists of a novel programming language and an optimizing compiler. It is able to synthesize a very efficient SDR code from a high-level PHY description written in Ziria language.
This project targets on using automatic techniques to reduce MTTR of large-scale online service systems.
MODIST is a practical software model checker for unmodified concurrent, distributed and cloud systems. MODIST explores different execution paths systematically as well as simulating a variety of environment faults to discover subtle corner-case defects. We have applied MODIST in Oracle Berkely DB, MPS(Paxos implementation), SQL Azure, Windows Azure Storage and other real systems, and found many new bugs.
This is the website of the rack-scale computing research project at MSRC
This project re-imagines and re-engineers wide area networks, to more than double their efficiency and allow flexible sharing of resources.