Microsoft Research India Theory Day featured a one-day workshop focusing on algorithms and cryptography, jointly organized by Microsoft Research India and the Indian Institute of Science (IISc).

## About India Theory Day

Microsoft Research India Theory Day was held in Bangalore on December 22, 2007. It featured a one-day workshop focusing on algorithms and cryptography. The MSR India Theory Day 2007 was jointly organized by Microsoft Research India and the Indian Institute of Science (IISc).

An eminent panel of speakers addressed the workshop. The speakers included Adi Shamir, Madhu Sudan, Umesh Vazirani, Santosh Vempala and Ravi Kannan. They presented the latest research developments in theoretical computer science in a manner accessible to advanced students and researchers.

## Speakers

**Prof. Umesh Vazirani (UC Berkeley)****Prof. Adi Shamir (Weizmann Institute of Science)****Dr. Ravi Kannan (Microsoft Research)****Prof. Santosh Vempala (Georgia Tech)****Prof. Madhu Sudan (MIT)**

## Abstracts

**Expander Flows, Graph Spectra and Graph Separators**- Umesh Vazirani

A graph separator is a set of edges whose deletion partitions the graph into two (or more) pieces. Finding small graph separators is a fundamental combinatorial problem with numerous applications. Interest in it also derives from its theoretical connections to spectral methods, linear/semi-definite programming, high dimensional geometry and measure concentration. In this talk I will speak about very fast approximation algorithms for this problem, that are almost as fast as max-flow computations. The analysis of these algorithms is based on a cut-matching game, a new embedding of the graph in R^n called the walk-embedding, and the notion of {\em expander flows}, introduced in [ARV], which constitute an interesting "certificate" of a graph's expansion.

**Multivariate Cryptography**- Adi Shamir

The security of the RSA cryptosystem is based on the difficulty of solving a single algebraic equation in one variable over a large domain such as the integers modulo n=pq. The security of multivariate cryptosystems is based on the difficulty of solving many algebraic equations in many variables over a small domain such as GF(2). The best known multivariate scheme is SFLASH, which is basically an obfuscated variant of RSA with many variables. It is very efficient, and was selected in 2003 by the European NESSIE project as one of only three recommended signature schemes, and as the one most suitable for constrained devices. In this talk I will describe a new type of mathematical attack on mutivariate schemes which can break SFLASH with its largest recommended parameters in a few seconds on a single PC. This is joint work with Dubois, Fouque, and Stern. The talk will be self contained, requiring only basic knowledge about the structure of finite fields.

**Generative Models, Planted Problems****- Ravi Kannan**

The input data to many problems on Graphs, Clustering, Learning etc. is a matrix. Often the problems are hard in the worst-case. To study the average-case, one needs a generative model. A natural model is to assume "full-independence" – i.e., the entries of the matrix are chosen independently. This facilitates analysis because of deep classical theorems. We will discuss this as well as recent work on making do with the much more realistic assumption of partial independence, where only the columns are assumed to be independent of each other. Some mention will be made of "planted" models where in addition to the random structure, there is an artificial "plant" and the objective is to find it.

**Spectral Algorithms for Learning and Clustering: PCA and Beyond****- Santosh Vempala **

The spectrum of a matrix (or a graph) captures many interesting properties in surprising ways. Spectral methods (based on eigenvalues and eigenvectors) are already used for unsupervised learning, image segmentation, to improve precision and recall in databases and broadly for information retrieval. The common component of these methods is a projection to the space of a few singular vectors of the data. In this talk, we describe this idea via its role in efficient algorithms for (a) clustering object-feature (document-term) matrices and (b) the classical problem of learning a mixture of Gaussians. We highlight the apparent limit of the method for the latter, then present a new variant of principal component analysis (which we call isotropic PCA) that is able to go significantly beyond this limit. In particular, to classify a sample from a mixture of two arbitrary Gaussians one only needs to assume that they are almost separable by a hyperplane.

**Towards Universal Semantic Communication****- Madhu Sudan**

Is it possible for two intelligent players to communicate meaningfully with each other, without any prior common background? What does it even mean for the two players to understand each other? In addition to being an intriguing question in its own right, we argue that this question also goes to the heart of modern communication infrastructures, where misundestandings (mismatches in protocols) between communicating players are a major source of errors. We believe that questions like this need to be answered to set the foundations for a robust theory of (meaningful) communication. In this talk, I will describe what computational complexity has to say about such interactions. Most of the talk will focus on how some of the nebulous notions, such as intelligence and understanding, should be defined in concrete settings. We assert that in order to communicate "successfully", the communicating players should be explicit about their goals - what the communication should achieve. We show examples that illustrate that when goals are explicit the communicating players can achieve meaningful communication. Joint work with Brendan Juba (MIT).