The third annual New England Machine Learning Day will be held May 13th, 2014, at Microsoft Research New England, One Memorial Drive, Cambridge, MA 02142. The event will bring together local academics and researchers in machine learning and its applications. There will be a lively poster session during lunch, like NEML 2012 and NEML 2013.
Due to overwhelming interest our event is now at capacity and registration has closed.
There will be a poster session in the afternoon.
For any questions, please contact MLday14@microsoft.com.
Agenda and abstracts
|9:50 - 10:00||
|10:00 - 10:30||
Joint Modeling of Imaging and Genetic Data
Polina Golland, MIT
We propose a unified Bayesian framework for detecting genetic variants associated with a disease while exploiting image-based features as an intermediate phenotype. Traditionally, imaging genetics methods comprise two separate steps. First, image features are selected based on their relevance to the disease phenotype. Second, a set of genetic variants are identified to explain the selected features. In contrast, our method performs these tasks simultaneously to ultimately assign probabilistic measures of relevance to both genetic and imaging markers. We derive an efficient approximate inference algorithm that handles high dimensionality of imaging genetic data. We evaluate the algorithm on synthetic data and show that it outperforms traditional models. We also illustrate the application of the method in a study of Alzheimer's disease. Joint work with Kayhan Batmanghelich, Adrian Dalca, Mert Sabuncu.
|10:35 - 11:05||
"Perceptual Annotation": from Biologically Inspired, to Biologically Informed Machine Learning
David Cox, Harvard | Video recording
Many machine learning applications, explicitly or implicitly, attempt to mimic natural human abilities in a machine. Indeed, any setting where human-provided labels are used as ground truth – whether the system aspires to be biologically-inspired or not – is ultimately driven by the human visual and cognitive system and its ability to provide accurate examplar labels. However, human-provided ground-truth labels are in many ways just the tip of the iceberg of the information that can be extracted from human judgments. I will describe a new approach -- called "perceptual annotation" -- in which we use an advanced online psychometric testing platform to acquire new kinds of human annotation data, and we incorporate these data directly into the formulation of a machine learning algorithm. A key intuition for this approach is that while it may be infeasible to dramatically increase the amount of data and high-quality labels available for the training of a given system, measuring the latent exemplar-by-exemplar landscape of difficulty and patterns of human errors can provide important information for regularizing the solution of the system at hand. Finally, I will conclude by exploring how this approach can be extended to incorporate an even greater diversity of different kinds of biological data.
|11:10 - 11:40||
Adam Tauman Kalai, Microsoft Research New England | Video recording
English text today is often machine printed or displayed on screens using the same letters that were carved in stone and handwritten on wax and parchment over two thousand years ago. We consider the problem of developing radically different characters for the same underlying twenty-six letter English alphabet, just as Braille or cursive are alternative representations. We discuss optimizing these letters for multiple criteria using crowdsourcing and machine learning.
|11:40 - 1:45||
Lunch and posters
|1:45 - 2:15||
Learning to Act in Multiagent Sequential Environments
Michael Littman, Brown University | Video recording
From routing to online auctions, many decision-making tasks for learning agents are carried out in the presence of other decision makers. I will give a brief overview of results developed in the context of adapting reinforcement-learning algorithms to work effectively in multiagent environments. Of particular interest is the idea that even simple scenarios, such as the well-known Prisoner’s dilemma, require agents to work together, bearing some individual risk, to arrive at mutually beneficial outcomes.
|2:20 - 2:50||
Signals on Graphs: Efficient Detection & Recovery
Venkatesh Saligrama, Boston University
Several problems such as network intrusion, community detection, disease outbreak, and cell signaling can be described in terms of an attributed graph with signals associated with nodes and edges. In these applications presence of intrusion, community, disease outbreak, or signal pathway is characterized by novel observations on some unknown connected subgraph. These problems can be formulated in terms of optimization of suitable objectives on connected sub-graphs, a problem which is generally computationally difficult. We overcome the combinatorics of connectivity algebraically through embedding of connected subgraphs into linear matrix inequalities (LMI). Computationally efficient tests are then realized by optimizing convex objective functions subject to these LMI constraints. We show that our tests are minimax optimal for exponential family of distributions and for graphs satisfying polynomial growth property.
|2:50 - 3:20||
|3:20 - 3:50||
Redefining Class Definitions using Constraint-Based Clustering and its Application to Landcover Classification and the AAAI 2014 Keywords
Carla Brodley, Tufts
Two aspects are crucial when constructing any real world supervised classification task: the set of classes whose distinction might be useful for the domain expert, and the set of classifications that can actually be distinguished by the data. Often a set of labels is defined with some initial intuition but these are not the best match for the task. For example, labels have been assigned for land cover classification of the Earth but it has been suspected that these labels are not ideal and some classes may be best split into subclasses whereas others should be merged. We present an approach that formalizes this problem using three ingredients: the existing class labels, the underlying separability in the data, and input from the domain expert specifying an LxL matrix of pairwise probabilistic constraints expressing their beliefs as to whether the L classes should be kept separate, merged, or split. We describe how the problem can be solved by casting it as an instance of constraint-based clustering. We present results demonstrating its application to the task of redefining a class taxonomy for land cover classification of the Earth and redefining the set of high-level keywords for AAAI 2014.
|3:55 - 4:25||
Machine Learning for Clinical and Mobile Health
Ben Marlin, University of Massachusetts, Amherst
The effective analysis of emerging sources of complex clinical and mobile health data represent a key challenge for machine learning. These data often exhibit multiple complicating factors including sparse and irregular sampling, incompleteness, noise, non-stationary temporal dynamics, high levels of between-subjects variability, high volume, high velocity, and significant heterogeneity and multi-modality. In this talk, I will present an overview of some of the machine learning problems my research group is currently working on motivated by ongoing collaborations in both clinical and mobile health. These problems include modeling and prediction of sparse and irregularly sampled physiological time series data from intensive care unit electronic health records, feature extraction and event detection from noisy wearable on-body sensor data, and learning what to sense in the energy and computation constrained mobile device setting.
|4:30 - 5:00||
Towards More Human-Like Machine Learning
Joshua B Tenenbaum, MIT
How can we build a machine that learns to see the world as a human being does? This question has been at the heart of the fields of AI and machine learning since their inceptions. Recently the question has seen renewed interest from researchers taking various "big data" approaches, such as training many-layered neural networks to find structure in millions of images, or mining the web to build databases of millions of common-sense facts. I will talk about our recent work taking a different approach, based on trying to reverse-engineer the core cognitive capacities and learning mechanisms of young children and infants. In contrast to conventional big-data ML approaches, children parse their experience using rich causally structured generative models, and learn new models from very little evidence; often just a single example is sufficient to grasp a new concept and generalize in richer ways than machine learning systems can typically do even with hundreds or thousands of examples. I will show how we are beginning to capture these perception and learning abilities in computational terms using techniques based on probabilistic programs and program induction, embedded in a broadly Bayesian framework for inference under uncertainty.
Ryan Adams (Computer Science, Harvard)
Sham Kakade (Microsoft Research New England)
Lorenzo Rosasco (Universita' di Genova and Massachusetts Institute of Technology)
Stefanie Tellex (Computer Science, Brown)
The steering committee that selects the organizers of ML Day each year consists of Sham Kakade, Adam Kalai, and Joshua Tenenbaum.
Date & time
Tuesday, May 13, 2014
Registration and breakfast begin at 9 a.m.
Microsoft Research New England
Horace Mann Conference Room
First Floor Conference Center
One Memorial Drive, Cambridge, MA
Upon arrival, be prepared to show a picture ID and sign the Building Visitor Log when approaching the Lobby Floor Security Desk. Alert them to the name of the event you are attending and ask them to direct you to the appropriate floor. The talks will be held the First Floor Conference Center, Horace Mann Conference Room.
Hospitality Notice for University and Government Employees: Microsoft Research is providing hospitality at this event. Please consult with your institution to determine whether you can accept meals and other hospitality under your institution's ethics rules and any other laws that might apply. By accepting our invitation, you confirm that this invitation is compliant with your institution's policies.
Please email the conference organizers at MLday14@microsoft.com.