Information Geometry Sanjoy Dasgupta
This tutorial will focus on entropy, exponential families, and information projection. We'll start by seeing the sense in which entropy is the only reasonable definition of randomness. We will then use entropy to motivate exponential families of distributions – which include the ubiquitous Gaussian, Poisson, and Binomial distributions, but also very general graphical models. The task of fitting such a distribution to data is a convex optimization problem with a geometric interpretation as an 'information projection': the projection of a prior distribution onto a linear subspace (defined by the data) so as to minimize a particular informationtheoretic distance measure. This projection operation, which is more familiar in other guises, is a core optimization task in machine learning and statistics. We'll study the geometry of this problem and discuss algorithms for it. 

Active Learning and Annotation Sanjoy Dasgupta
The 'active learning' model is motivated by scenarios in which it is easy to amass vast quantities of unlabeled data (images and videos off the web, speech signals from microphone recordings, and so on) but costly to obtain their labels. Like supervised learning, the goal is ultimately to learn a classifier. But the labels of training points are hidden, and each of them can be revealed only at a cost. The idea is to query just a few labels that are especially informative about the decision boundary, and thereby to obtain an accurate classifier at significantly lower cost than regular supervised learning. There are two distinct ways of conceptualizing active learning, which lead to rather different querying strategies. The first treats active learning as an efficient search through a hypothesis space of candidates, while the second has to do with exploiting cluster or neighborhood structure in data. This talk will show how each view leads to active learning algorithms that can be made efficient and practical, and have provable label complexity bounds that are in some cases exponentially lower than for supervised learning. 

Distributed Machine Learning Algorithms: CommunicationComputation Tradeoffs  Part 2 Sundararajan Sellamanickam
Distributed machine learning is an important area that has been receiving considerable attention from academic and industrial communities, as data is growing in unprecedented rate. In the first part of the talk, we review several popular approaches that are proposed/used to learn classifier models in the big data scenario. With commodity clusters priced on system configurations becoming popular, machine learning algorithms have to be aware of the computation and communication costs involved in order to be cost effective and efficient. In the second part of the talk, we focus on methods that address this problem; in particular, considering different data distribution settings (e.g., example and feature partitions), we present efficient distributed learning algorithms that tradeoff computation and communication costs. 

Distributed Machine Learning Algorithms: CommunicationComputation Tradeoffs  Part 1 Sundararajan Sellamanickam
Distributed machine learning is an important area that has been receiving considerable attention from academic and industrial communities, as data is growing in unprecedented rate. In the first part of the talk, we review several popular approaches that are proposed/used to learn classifier models in the big data scenario. With commodity clusters priced on system configurations becoming popular, machine learning algorithms have to be aware of the computation and communication costs involved in order to be cost effective and efficient. In the second part of the talk, we focus on methods that address this problem; in particular, considering different data distribution settings (e.g., example and feature partitions), we present efficient distributed learning algorithms that tradeoff computation and communication costs. 

Scaling Up Reinforcement Learning B. Ravindran
Distributed machine learning is an important area that has been receiving considerable attention from academic and industrial communities, as data is growing in unprecedented rate. In the first part of the talk, we review several popular approaches that are proposed/used to learn classifier models in the big data scenario. With commodity clusters priced on system configurations becoming popular, machine learning algorithms have to be aware of the computation and communication costs involved in order to be cost effective and efficient. In the second part of the talk, we focus on methods that address this problem; in particular, considering different data distribution settings (e.g., example and feature partitions), we present efficient distributed learning algorithms that tradeoff computation and communication costs. 

Reinforcement Learning: An Introduction B. Ravindran
Distributed machine learning is an important area that has been receiving considerable attention from academic and industrial communities, as data is growing in unprecedented rate. In the first part of the talk, we review several popular approaches that are proposed/used to learn classifier models in the big data scenario. With commodity clusters priced on system configurations becoming popular, machine learning algorithms have to be aware of the computation and communication costs involved in order to be cost effective and efficient. In the second part of the talk, we focus on methods that address this problem; in particular, considering different data distribution settings (e.g., example and feature partitions), we present efficient distributed learning algorithms that tradeoff computation and communication costs. 

Submodular Optimization and Machine Learning  Part 2 Stefanie Jegelka
Many problems in machine learning that involve discrete structures or subset selection may be phrased in the language of submodular set functions. The property of submodularity, also referred to as a 'discrete analog of convexity', expresses the notion of diminishing marginal returns, and captures combinatorial versions of rank and dependence. Submodular functions occur in a variety of areas including graph theory, information theory, combinatorial optimization, stochastic processes and game theory. In machine learning, they emerge in different forms as the potential functions of graphical models, as the utility functions in active learning and sensing, in models of diversity, in structured sparse estimation or network inference. The lectures will give an introduction to the theory of submodular functions, some applications in machine learning and algorithms for minimizing and maximizing submodular functions that exploit ties to both convexity and concavity. 

Submodular Optimization and Machine Learning  Part 1 Stefanie Jegelka
Many problems in machine learning that involve discrete structures or subset selection may be phrased in the language of submodular set functions. The property of submodularity, also referred to as a 'discrete analog of convexity', expresses the notion of diminishing marginal returns, and captures combinatorial versions of rank and dependence. Submodular functions occur in a variety of areas including graph theory, information theory, combinatorial optimization, stochastic processes and game theory. In machine learning, they emerge in different forms as the potential functions of graphical models, as the utility functions in active learning and sensing, in models of diversity, in structured sparse estimation or network inference. The lectures will give an introduction to the theory of submodular functions, some applications in machine learning and algorithms for minimizing and maximizing submodular functions that exploit ties to both convexity and concavity. 

Panel Q and A Prateek Jain, ChinJen Lin, Aditya Gopalan, Suvrit Sra, and Stefanie Jegelka


Introduction to largescale optimization  Part 2 Suvrit Sra
These lectures will cover both basics as well as cuttingedge topics in largescale convex and nonconvex optimization (continuous case only). Examples include stochastic convex optimization, variance reduced stochastic gradient, coordinate descent methods, proximalmethods, operator splitting techniques, and more. The lectures will also cover relevant mathematical background, as well as some pointers to interesting directions of future research. 
 Microsoft Research India
“Scientia”
196/36 2nd Main
Sadashivnagar, Bangalore 560 080
India