Speaker Florian Metze
Host Daniel Povey
Affiliation Carnegie Mellon University
Date recorded 11 January 2012
In this talk, we present results on applying a personality assessment paradigm to speech input, and compare human and automatic performance on this task. We cue a professional speaker to produce speech using different personality profiles and encode the resulting vocal personality impressions in terms of the Big Five NEO-FFI personality traits. We then have human raters, who do not know the speaker, estimate the five factors. We analyze the recordings using signal-based acoustic and prosodic methods and observe high consistency between the acted personalities, the raters’ assessments, and initial automatic classification results. We further validate the application of our paradigm to speech input, and extend it towards text independent speech. We show that human labelers can consistently label speech data generated across multiple recording sessions with respect to personality, and investigate further which of the 5 scales in the NEO-FFI scheme can be assessed from speech, and how a manipulation of one scale influences the perception of another. Finally, we present a top-down clustering of human labels of personality traits derived from speech, which will be useful in future experiments on automatic classification of personality traits. This presents a first step towards being able to handle personality traits in speech, which we envision will be used in future voice-based communication between humans and machines.
©2012 Microsoft Corporation. All rights reserved.