Kornel Laskowski and Elizabeth Shriberg
2012
Stochastic turn-taking models use a truncated representation of past
speech activity to specify how likely a speaker is to talk at the next
instant. An unanswered question in such modeling is how far back
to extend the conditioning context. We study this question using
Switchboard (English, telephone) and Spontal (Swedish, face-toface)
conversations. We also explore whether to trade off precision
with range when moving backward in the history. We find that
(1) a nearly logarithmic compression of history is optimal, for both
speaker and interlocutor; (2) the absolute duration of the conditioning
context is at least 7 seconds; and (3) the compression scheme
generalizes remarkably well across the two different corpora.
![]() PDF file |
Publisher IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
| Type | Inproceedings |
| Pages | 4937 - 4940 |