Soroosh Mariooryad, Anitha Kannan, Dilek Hakkani-Tur, and Elizabeth Shriberg
Recent studies have shown the importance of using online videos along with textual material in educational instruction, especially for better content retention and improved concept understanding. A key question is how to select videos to maximize student engagement, particularly when there are multiple possible videos on the same topic. While there are many aspects that drive student engagement, in this paper we focus on presenter speaking styles in the video. We use crowd-sourcing to explore speaking style dimensions in online educational videos, and identify six broad dimensions: liveliness, speaking rate, pleasantness, clarity, formality and confidence. We then propose techniques based solely on acoustic features for automatically identifying a subset of the dimensions. Finally, we perform video re-ranking experiments to learn how users apply their speaking style preferences to augment textbook material. Our findings also indicate how certain dimensions are correlated with perceptions of general pleasantness of the voice.
In International Conference on Acoustics, Speech and Signal Processing
Publisher International Conference on Acoustics, Speech, and Signal Processing (ICASSP)