Vision to Language

The recent advances in computer vision, natural language processing and other related areas has led to a renewed interest in artificial intelligence applications spanning multiple domains. Specifically, the generation of natural human-like captions for images has seen an extraordinary increase in interest. In this session, the speakers provide insight into this area. They describe several techniques that combine state-of-the-art computer vision techniques and language models to produce descriptions of visual content with surprisingly high quality. The limitations of current approaches and the challenges that lie ahead are both emphasized.

Speaker Details

C. Lawrence Zitnick is a principal researcher in the Interactive Visual Media group at Microsoft Research, and is an affiliate associate professor at the University of Washington. He is interested in a broad range of topics related to visual object recognition, language and artificial intelligence. He developed the PhotoDNA technology used by Microsoft, Facebook, Google, and various law enforcement agencies to combat illegal imagery on the web. Before joining Microsoft Research, he received the PhD degree in robotics from Carnegie Mellon University in 2003.

Julia Hockenmaier is associate professor in Computer Science at the University of Illinois at Urbana-Champaign. She works on natural language processing. Her current research focuses on automatic image description, statistical parsing, and unsupervised grammar induction. Julia received her PhD from the University of Edinburgh and did postdoctoral work at the University of Pennsylvania. She has received an NSF CAREER award was shortlisted for the British Computer Society’s Distinguished Dissertation award.

“Margaret Mitchell is a researcher in Microsoft’s NLP Research Group, working on grounded language generation. Before joining Microsoft, she was a postdoctoral researcher at The Johns Hopkins University Center of Excellence, where she worked on semantic role labeling and sentiment analysis using graphical models. ”

Richard Zemel is a Professor of Computer Science at the University of Toronto, where he has been a faculty member since 2000. Prior to that, he was an Assistant Professor of Computer Science and Psychology at the University of Arizona, and was a Postdoctoral Fellow at the Salk Institute and at Carnegie Mellon University. He received the B.Sc. degree in History & Science from Harvard University in 1984, and a Ph.D. in Computer Science from the University of Toronto in 1993. He has received several awards and honors, including a Young Investigator Award from the Office of Naval Research, and six Dean’s Excellence Awards at the University of Toronto. He is a Fellow of the Canadian Institute for Advanced Research, and a member of the NIPS Advisory Board. His research interests include topics in machine learning, vision, and neural coding. His recent research focuses on structured output models, image-text analysis, and fairness.

Series:: Microsoft Research Faculty Summit
Date:: July 17, 2015
Speakers:: Julia Hockenmaier, Larry Zitnick, Margaret Mitchell, and Richard Zemel
Affiliation:: Microsoft Research, Universty of Illinois-Urbana Champaign, University of Toronto