Inference and Learning in Structured-Output Models for Computer Vision
A large number of problems in computer vision involve predictions over exponentially (or infinitely) large structured-output spaces, e.g. the space of segmentations of an image, the space of all object-part hierarchies in a context-free grammar, the space of all pixel-level depth-predictions, etc.
In order to build intelligent vision systems that are able to reason about these tasks, we must address the challenges of 1) representation: how do we store and represent beliefs over exponentially and infinitely large output-spaces? 2) learning: how do we learn these beliefs from data? 3) inference: how do we predict under these beliefs? and 4) their interactions: the richer the model, the more difficult it is to learn and infer under. In this talk, I will present a sampling of my recent work that addresses some of these challenges.
While a lot of progress has been made on the “static” version of the MAP inference problem, a number of situations require dynamic inference algorithms that must adapt and reorder computation to focus on “important” parts of the problem. I will present a novel measure for identifying such important parts of the problem and demonstrate how it is useful in speeding up inference algorithms in a variety of settings.
Next, I will talk about our recent work on the M-Best-Mode problem, which involves extracting not just the most probable solution, but also a /diverse/ set of top M most probable solutions in discrete graphical models (like MRFs/CRFs). Extracting the top M modes of the distribution allows us to better exploit the beliefs that our model holds.
Joint work with Pushmeet Kohli (MSRC), Vladimir Kolmogorov (IST), Sebastian Nowozin (MSRC), Greg Shakhnarovich (TTIC), Ashutosh Saxena (Cornell), Daniel Tarlow (UToronto) and Payman Yadollahpour (TTIC).
Speaker Details
Dhruv Batra is a Research Assistant Professor at Toyota Technological Institute at Chicago (TTIC), a philanthropically endowed academic computer science institute affiliated with the University of Chicago. He received his M.S. and Ph.D. degrees from Carnegie Mellon University in 2007 and 2010 respectively, advised by Tsuhan Chen. In the past, he has held visiting positions at Cornell University and MIT.
His research interests include computer vision, machine learning and applications of combinatorial optimization algorithms to learning and vision tasks. Specifically, he is interested in structured-output prediction, MAP inference in MRFs, max-margin methods, co-segmentation in multiple images, and interactive 3D modeling.
- Series:
- Microsoft Research Talks
- Date:
- Speakers:
- Dhruv Batra
- Affiliation:
- Toyota Technological Institute at Chicago (TTIC)
-
-
Jeff Running
-
Series: Microsoft Research Talks
-
Decoding the Human Brain – A Neurosurgeon’s Experience
Speakers:- Pascal Zinn,
- Ivan Tashev
-
-
-
-
-
-
Challenges in Evolving a Successful Database Product (SQL Server) to a Cloud Service (SQL Azure)
Speakers:- Hanuma Kodavalla,
- Phil Bernstein
-
Improving text prediction accuracy using neurophysiology
Speakers:- Sophia Mehdizadeh
-
Tongue-Gesture Recognition in Head-Mounted Displays
Speakers:- Tan Gemicioglu
-
DIABLo: a Deep Individual-Agnostic Binaural Localizer
Speakers:- Shoken Kaneko
-
-
Recent Efforts Towards Efficient And Scalable Neural Waveform Coding
Speakers:- Kai Zhen
-
-
Audio-based Toxic Language Detection
Speakers:- Midia Yousefi
-
-
From SqueezeNet to SqueezeBERT: Developing Efficient Deep Neural Networks
Speakers:- Sujeeth Bharadwaj
-
Hope Speech and Help Speech: Surfacing Positivity Amidst Hate
Speakers:- Monojit Choudhury
-
-
-
-
-
'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project
Speakers:- Peter Clark
-
Checkpointing the Un-checkpointable: the Split-Process Approach for MPI and Formal Verification
Speakers:- Gene Cooperman
-
Learning Structured Models for Safe Robot Control
Speakers:- Ashish Kapoor
-