Recognition of Multimodal Group Actions in Meetings

Iain McCowan, Daniel Gatica-Perez, Samy Bengio, Guillaume Lathoud



In this presentation we discuss recent work in the recognition of group actions in meetings. As background to our research, we present different perspectives and reasons for researching meetings. Following this, we will consider several aspects of communication in meetings which motivate our computational approach to automatic meeting analysis. Specifically, our approach is based on three assumptions: (1) a significant amount of what matters in meetings can be extracted from modeling participant interactions, here called group actions; (2) group actions can be organized in semantic terms via multiple, parallel group action languages, each describing the content of a meeting from a particular semantic viewpoint; and (3) group actions can be recognized by the extraction of multimodal (i.e. audio-visual) features that measure the activity of individual participants, and the use of generative sequence models.  Such an approach poses several interesting challenges for machine learning, related to the modelling of multiple, interacting, asynchronous streams of multi-modal data.


Longer abstract (PS file)


Talk slides (PDF file)


Technical Report (PDF file)


Return to Machine Learning and User Interface workshop page.