B. Byun, A. Awasthi, P. A. Chou, A. Kapoor, B. Lee, and M. Czerwinski
We propose a novel system to analyze gestural and non-verbal cues of participants in video conferencing. These cues have previously been referred to as “honest signals” and are usually associated with the underlying cognitive state of the participants. The presented system analyzes a set of audio-visual, non-linguistic features in real time from the audio and video streams of two participants in a video conference. We show how these features can be used to compute indicators of the overall quality and type of conversation being held. The system also provides visual feedback to the participants, who then have the choice of modifying their conversational style in order to achieve the desired outcome of the video conference. Experiments on real-life data show that the system can predict the type of conversation with high accuracy using the non-linguistic signals only. Qualitative user studies highlight the positive effects of increased awareness amongst the participants about their own gestural and non-verbal cues.
In Int'l Conf. on Multimedia and Expo (ICME)
© 2011 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.