s u m i t * b a s u
s u m i t b @ m i c r o s o f t . c o m





I'm Sumit Basu, a Senior Researcher in the Machine Learning Department at Microsoft Research, Redmond. My research focus is on developing interactive, machine-learning based power tools to assist users in understanding and extracting answers from complex data - teaching material/textbooks, computer systems, sensory signals like speech or music, scientific data, document collections, or the web. These power tools sometimes work by observing a user as they perform a task, then assisting them in their efforts once it understands what's going on; in other cases (as in teaching) they provide inputs to the user and adaptively refine their strategy based on what works best.  The interactive aspect comes from having humans in a tight loop with the learning algorithm: instead of getting a big batch of labeled data, interactive learning tasks involve a delicate dance between the human and the algorithm to achieve sufficient performance with a minimum of operator effort. 

These days, I'm particularly interested in how we can use machine learning technologies to help human learners, teachers, and tutors at all levels with their educational goals.  This goes from the high school student who's studying for the SAT to the adult learner who wants the master the background information about a new company before her job interview. It's a deep and complex area, involving problems in document analysis, question generation, automatic grading, models of human memory, models of human understanding and skill level, and much more. If you're a bright graduate student interested in such problems and curious about internship opportunities, drop me a line!

a brief bio



I'm co-organizing a Workshop on Data-Driven Education at NIPS 2013 along with Jonathan Huang and Kalyan Veeramachaneni. We welcome your submissions and participation - abstracts are due October 16.

Here are the slides from our recent UW-MSR Colloquium Talk on "Making Reading More Effective" (November 2012), which gives an early overview of some of the work we are doing in the education space.

We recently presented our NAACL 2012 paper on generating quizzes from arbitrary text data, part of our efforts on teaching with machine learning.  The corpus from the paper as well as a video of the slideshow with narration will be available shortly.

current projects


Teaching with Machine Learning: using machine learning to help students and teachers of all ages and all types of educational goals achieve their objectives more effectively and efficiently.

Sho: a powerful interactive environment for scientific computing and prototyping based on IronPython. Find out more and download it  here.  Also check out this code for getting real-time skeleton data from Kinect in Sho.

earlier projects  

Songsmith: a songwriting tool that takes melodies and helps develop accompaniments for them: based on this research with Dan Morris, it's now a product (with much help from the MSR Advanced Development Team). Check it out and download the trial here. It's also now free to many educational institutions via MSDN Academic Alliance and the Innovative Teachers' Network.

StickySorter: a tool for doing affinity diagramming and other flavors of information organization I developed with Julie Guinn and Office Labs: you can download it here.

Music Analysis/Synthesis: using machine learning to help users understand, manipulate, and create music

Systems and Machine Learning: using machine learning to address problems in computer systems

Conversational Scene Analysis: seeking structure and content from conversational patterns

recent papers


Michael Brooks, Sumit Basu, Chuck Jacobs, and Lucy Vanderwende: "Divide and Correct: Using Clusters to Grade Short Answers at Scale." To Appear in Proceedings of the First ACM Learning at Scale Conference. March, 2014.

Sumit Basu, Chuck Jacobs, and Lucy Vanderwende. "Powergrading: a Clustering Approach to Amplify Human Effort for Short Answer Grading." Transactions of the ACL, 1 (October) 2013.

Sumit Basu and Janara Christensen. "Teaching Classification Boundaries to Humans."  In Proceedings of AAAI 2013.

Dengyong Zhou, John C. Platt, Sumit Basu, and Yi Mao. "Learning from the Wisdom of Crowds with Minimax Entropy."  Appears in Proceedings of NIPS 2012.

A. Kumaran, Sujay Kumar Jauhar, and Sumit Basu. "Doodling: A Gaming Paradigm for Generating Language Data." Appears in Proceedings of the Human Computation Workshop 2012.

Ashish Kapoor, Simon Baker, Sumit Basu, and Eric Horvitz. "Memory Constrained Face Recognition." In Proceedings of CVPR 2012.

Lee Becker, Sumit Basu, and Lucy Vanderwende. "Mind the Gap: Learning to Choose Gaps for Question Generation." In Proceedings of NAACL 2012.

Sumit Basu, John Dunagan, Kevin Duh, and Kiran-Kumar Munuswamy-Reddy. "Bilinear Logistic Regression for Factored Diagnosis Problems." Operating Systems Review, 45(3):31-38. December, 2011. Also presented at the SLAML workshop at SOSP 2011. [paper] [slides]

Steven M. Drucker, Danyel Fisher, and Sumit Basu. "Helping Users Sort Faster with Adaptive Machine Learning Recommendations." In Proceedings of INTERACT 2011.  September, 2011.

Sumit Basu, Danyel Fisher, Steven M. Drucker, and Hao Lu. "Assisting Users with Clustering Tasks by Combining Metric Learning and Classification."  In Proceedings of AAAI 2010.  July, 2010.

Andrew Guillory, Sumit Basu, and Dan Morris.  "User-Specific Learning for Recognizing a Singer's Intended Pitch."  In Proceedings of AAAI 2010. July, 2010.

Eric Nichols, Dan Morris, Sumit Basu, and Christopher Raphael.  "Relationships Between Lyrics and Melody in Popular Music."  In Proceedings of ISMIR 2009. October, 2009.

Alan Ritter and Sumit Basu.  "Learning to Generalize for Complex Selection Tasks."  In Proceedings of Intelligent User Interfaces 2009 (IUI '09).  January, 2009.  [Winner of Best Student Paper Award] [paper and video]

Eric Nichols, Dan Morris, and Sumit Basu.  "Data-Driven Exploration of Musical Chord Sequences."  In Proceedings of Intelligent User Interfaces 2009 (IUI '09).  January, 2009.

Michael Gamon, Sumit Basu, Dmitriy Belenko, Danyel Fisher, Matthew Hurst, Arnd Christian König. "BLEWS - Using Blogs to Provide Context for News Articles."  In Proceedings of the International Conference on Weblogs and Social Media (ICWSM). April, 2008. [paper] [video demo]

Dan Morris, Ian Simon, and Sumit Basu.  "Exposing Parameters of a Trained Dynamic Model for Interactive Music Creation."  In Proceedings of AAAI 2008.  June, 2008.

Ian Simon, Dan Morris, and Sumit Basu.  "MySong:  Automatic Accompaniment Generation for Vocal Melodies."  In Proceedings of Computer-Human Interaction 2008 (CHI '08).  April, 2008.

Sumit Basu, Surabhi Gupta, Milind Mahajan, Patrick Nguyen, and John C. Platt.  "Scalable Summaries of Spoken Conversations."  In Proceedings of Intelligent User Interfaces 2008 (IUI '08).  January, 2008. [slides]

Sumit Basu, John Dunagan, and Greg Smith.  "Why Did My PC Suddenly Slow Down?"  In Proceedings of the Systems and Machine Learning Workshop 2007.  April, 2007.

Ashish Kapoor, Eric Horvitz, and Sumit Basu.  "Selective Supervision: Guiding Supervised Learning with Decision-Theoretic Active Learning."  In Proceedings of the Int'l. Joint Conf. on AI (IJCAI '07). January, 2007.

Karthik Gopalratnam, Sumit Basu, John Dunagan, and Helen Wang.  "Automatically Extracting Fields from Unknown Network Protocols."  In Proceedings of the Systems and Machine Learning Workshop 2006 (SysML'06).  June, 2006.

Sumit Basu.  "Acoustic Echo Cancellation in a Channel with Rapidly Varying Gain."  In Proceedings of the the Int'l Conf. on Multimedia and Expo 2006 (ICME'06).  July, 2006.

Jay Stokes, John Platt, and Sumit Basu.  "Speaker Identification Using a Microphone Array and a Joint HMM with Speech Spectrum and Angle of Arrival."  In Proceedings of the the Int'l Conf. on Multimedia and Expo 2006 (ICME'06).  July, 2006. 

Nebojsa Jojic, Sumit Basu, and Nemanja Petrovic.  "Home Video Browsing and Consumption Through Exploration of a Learned Generative Model."   (Video Submission).  In Proceedings of CVPR'06.  June 2006.

Nemanja Petrovic, Aleksandar Ivanovic, Nebojsa Jojic, Sumit Basu, and Thomas Huang.  "Recursive Estimation of Generative Models of Video."   In Proceedings of CVPR'06.  June 2006.

Ian Simon, Sumit Basu, David Salesin, and Maneesh Agrawala.  "Audio Analogies: Creating New Music from an Existing Performance by Concatenative Synthesis."   In Proceedings of the Int'l Conf. on Comp. Music 2005.  August, 2005.

Ashish Kapoor and Sumit Basu.  "The Audio Epitome: A New Representation for Modeling and Classifying Auditory Phenomena."  In Proceedings of ICASSP 2005.  May, 2005.

Tanzeem Choudhury and Sumit Basu.  "Modeling Conversational Dynamics as a Mixed-Memory Markov Process."  In Proceedings of NIPS 2004. December, 2004.

Sumit Basu. "Mixing with Mozart."  In Proceedings of the Int'l. Comp. Music Conf. (ICMC) 2004.  Miami.  November, 2004.

T. Paek, M. Agrawala, S. Basu, S. Drucker, T. Kristjansson, R. Logan, K. Toyama & A. Wilson. Toward universal mobile interaction for shared displays. Proceedings of Computer Supported Cooperative Work (CSCW), 2004, pp. 266-269.

Nebojsa Jojic, Sumit Basu, Nemanja Petrovic, Brendan Frey, and Thomas Huang.  "Joint design of Data Analysis Algorithms and User Interface for Video Applications."  In Proceedings of the Machine Learning Meets the User Interface Workshop (MLUI) at NIPS 2003.  Vancouver, BC.  December, 2003. 

Tanzeem Choudhury, Brian Clarkson,  Sumit Basu, and Alex Pentland. "Learning Communities: Connectivity and Dynamics of Interacting Agents."  Proceedings of the International Joint Conference on Neural Networks (IJCNN'03), Special Session on on Autonomous Mental Development. July 2003.

Sumit Basu, "A Linked-HMM Model for Voicing and Speech Detection."  Appears in the Proceedings of the IEEE Conf. on Acoustics, Speech, and Signal Processing ( ICASSP 2003)Hong Kong.  May, 2003. 

For older papers, please check here.


community activities


Senior PC Member, Intelligent User Interfaces 2009 (IUI'09), 2010, 2011.

PC Member, AAAI 2011 NECTAR Track

Co-Chair (with Ashish Kapoor), Workshop on Analysis and Design of Algorithms for Interactive Machine Learning (ADA-IML'09) at NIPS 2009.

PC Member, IJCAI'09.

Co-Chair (with Armando Fox), Systems and Machine Learning Workshop, 2008 (SysML 2008).  at OSDI 2008.

Co-Chair (with Archana Ganapathi, Emre Kiciman, and Fei Sha), MLSys'07: Workshop on Statistical Learning Techniques for Systems Problems at NIPS 2007.

Publicity Chair, NIPS 2007.  You can see the poster I designed for the conference here.

PC Member, Systems and Machine Learning Workshop, 2007 (SysML 2007).  at NSDI 2007.

For fall quarter 2007, Emre Kiciman and I taught a graduate course on Systems Applications of Machine Learning (cse599n) at the University of Washington.

Other quarters, I co-taught the Markovia Seminar (cse590mv) on Machine Learning with Tanzeem Choudhury at the University of Washington.



  drawing/painting/illustration, singing/songwriting, and clothing design/modification.


  s u m i t b @ m i c r o s o f  t . c o m, one microsoft way, redmond, wa 98052
@sumitsumit on Twitter
Contact Us Terms of Use Trademarks Privacy Statement

©2010 Microsoft Corporation. All rights reserved. Microsoft