We work on questions motivated by machine learning, in particular from the theoretical and computational perspectives. Our goals are to mathematically understand the effectiveness of existing learning algorithms and to design new learning algorithms. We combine expertise from diverse fields such as algorithms and complexity, statistics, and convex geometry.
Labs: Silicon Valley
MSR has a strong group of scientists working on algorithm design, analysis, and experimental evaluation, as well as researchers in related areas (e.g., coding theory), but no formal algorithms group. The Virtual Algorithms Center (VIRAL) brings these individuals together. The goals of the center is to enhance collaboration between algorithms researchers and the rest of MSR, provide internal consulting, and give an external view of the algorithms research at MSR.
Tabular is a Excel add-in that brings the power of model based machine learning to data enthusiasts. It allows the user to write a simple model that explains their data and perform Bayesian inference. Tabular is built on top of Infer.NET.
Crowded: Digital Piecework and the Politics of Platform Responsibility in Precarious Times looks as crowdsourcing as a focal point for many of the issues that are raised by the structure of our current information economy: economic value, cultural meaning, and ethics.
Labs: New England
This is an umbrella project for our activity in machine learning with exploration-exploitation tradeoff. Most of us are at MSR-NYC.
We work on fundamental issues in crowdsourcing, in particular, incentive mechanisms for paid crowdsourcing, algorithms and theory for crowdsourced problem solving.
This project focuses on advancing the state-of-the-art in language processing with recurrent neural networks. We are currently applying these to language modeling, machine translation, speech recognition, language understanding and meaning representation. A special interest in is adding side-channels of information as input, to model phenomena which are not easily handled in other frameworks.
It has become increasingly difficult to process information from the vast amount of data collected and stored in different computer systems all around us. The main challenge is how to design distributed and scalable algorithms and systems to extract useful information from this large quantity of data. The goal of this project is to address various questions related to distributed and scalable processing of big data.
The performance of the elliptic curve method (ECM) for integer factorizationplays an important role in the security assessment of RSA-based protocols as a cofactorization tool inside the number field sieve. This webpage gives addition-subtracting chains to optimize Edwards ECM in terms of both performance and memory requirements. See for more details the "ECM at Work" paper.
Labs: eXtreme Computing Group
Rainbow Tangle is a game on a sequence of permutations. The purpose of the game is to create as straight routes of the permutation particles as possible.
We work on novel sampling techniques to design faster algorithms for solving problems on large data. Algorithmic applications of our methods include dimensionality reduction, clustering, and linear algebraic problems on large matrices.
In the last 25 years, Elliptic Curve Cryptography (ECC) has become a mainstream primitive for cryptographic protocols and applications. ECC has been standardized for use in key exchange and digital signatures. This project focuses on efficient generation of parameters and implementation of ECC and pairing-based crypto primitives, across architectures and platforms.
LIQUi|> (pronounced "liquid") is a software architecture and toolsuite for quantum computing. It is currently being developed and will include programming languages, optimization and scheduling algorithms, and quantum simulators. Ultimately, LIQUi|> will be used to translate a quantum algorithm written in the form of a high-level program into the low-level machine instructions for a quantum device.
We introduced a game-theoretic framework for crowdsourcing systems.
Counterdog is an automated theorem prover for a counterfactual meta-logic on propositional Datalog. The prover is complete for the logic, and can prove (or disprove) counterfactual statements such as: "if 'p' is false in a Datalog program, but would be true if it contained 'a:-b', then 'b' is true in the program.". Counterdog is useful for reasoning about Datalog-based trust management languages. The theory and implementation were developed by Moritz Y. Becker and Nik Sultana.
In recent years, computational challenges have become more and more important to infer biologically relevant information from the vast amount of experimental data available to systems biologists. Building on work originally done by Marc Mezard and Riccardo Zecchina in the context of random instances of satisfiability, we are developing various computationally efficient algorithms for problems in systems biology.
Labs: New England
This is an umbrella project for several related efforts at Microsoft Research Silicon Valley in the areas of social network analysis and human computation.
Labs: Silicon Valley
What is the link between how problems are logically described in a formal language, and how they are solved by algorithms?
Time Series Foundation (TSF) is a .NET toolset for exploring new algorithms in time series analysis and forecasting. TSF is based on state space model methodology that includes all types of exponential smoothing, some autoregressive algorithms, and innovative algorithms for event detection and calendar event impact prediction. TSF implements an Excel interop layer and offers an Excel add-in that exposes a large subset of TSF functionality through the Excel ribbon UI.
Automata is a .NET tool kit that provides facilities for manipulating and analyzing regular expressions, symbolic finite automata, and symbolic finite transducers. It supports automata and transducers where input and output alphabets can be fully symbolic. Constraints over the alphabets can be analyzed using Satisfiability Modulo Theory (SMT) solvers. The tool kit provides a particular extension that uses the Microsoft SMT solver Z3.
It is often the case that mutually distrustful parties need to perform a joint computation but cannot afford to reveal their inputs to each other. This can occur, for example, during auctions, data mining, voting, negotiations and business analytics. Secure multi-party computation (MPC) allows a set of parties, each with a private input, to securely and jointly perform any computation over their inputs.
Scalable Joins is a new implementation of the Joins concurrency library designed to scale on parallel hardware.
Even with protections like array-bounds checking in place, C programs remain vulnerable to errors in untrusted third-party libraries. Yarra is a new extension to C that uses a combination of static and dynamic techniques to protect the integrity of critical data structures in a C program from corruption by buggy libraries. We have used Yarra to harden applications ranging from SSH, FTP, and HTTP servers to memory allocators like BGET from non-control data attacks in third party code.
In the ecosystem of online advertising, advertisers are the real customs of the ad platform and we need to seriously consider how to better engage them. We are collaborating with AdCenter to develop advanced technologies for advertiser engagement, including the quality measure of advertisers’ campaigns and the estimation of their future performances. At the same time, we are pushing the research frontier of advertiser engagement.
Highly efficient codes for data storage and transmission.