Prosody Logo Prosody and Prosodic Models
ICSLP 2002 - September 16, 2002, Denver Colorado
Chilin Shih and Greg Kochanski
Lucent Logo


Confirming a Word in a Question: Modeling Example

In this section we use one experiment to explore Stem-ML modeling. How to ask for confirmation of a particular word in English.

The Nature of the Problem

Question intonation (especially yes-no questions) in English typically has a rising tail. The rising gesture starts on the last stressed syllable.


When the emphasis is early, there is typically a "plateau" of high f0 values stretching from the stressed syllable to just before the end of the sentence. There is another final rise at the end of the sentence.

This intonation contour occurs frequently with number confirmation, either in human-human interaction or human-machine interaction. A dialogue system handling transactions may have a need to confirm a specific number in a string of credit card number or phone number. The most natural and effective way to confirm that number is to use rising intonation on the questionable digit, as in the 0 below.


It is desirable to have a good model of this intonation contour in a Text-to-Speech system to handle such dialogue acts.


We designed a small experiment to explore how to model this type of intonation contour.

The database consists of 200 digit sequences, organized in 16 blocks, with variations in phrasing, speaking speed, and single digit confirmation in different sentential positions. We'd like to see how question intonation interact with these factors. We also recorded declarative and yes-no question intonation as references.


Q0015.gif Audio Phrasing as indicated by the dash is clearly marked on declarative sentences. Pitch rises on the phrase initial digit.
Not surprisingly, declarative sentences end with falling pitch and questions end with rising pitch. This is a consistent difference between yes-no question and declarative sentence. Audio Audio Q0029.gif Q0030.gif
Q0036.gif Audio Digit confirmation is marked with a strong rise and longer duration on the digit being confirmed. Post-confirmation pitch remain high. Pre-confirmation phrasing is similar to that of declarative and yes-no question sentences.
There is another final rise in the confirmation sentences. But when the confirmed number is very close to the end, the confirmation rise and the final rise fuse together. Audio Audio Q0079.gif Q0077.gif
Q0049.gif Q0073.gif Audio Audio Post confirmation pitch tends to be flatter in fast speech than in slow speech.
Post confirmation accent returns after a while, this is especially clear in slow speech. Audio Q0067.gif
Q0039.gif Audio Post-confirmation phrasing is less obvious. However, when the phrasing structure is observable, new phrases are marked by pitch drop, in contrast to the pitch rise in declarative sentences.
The digit immediately before the confirmed digit tends to get de-accented. as in the 1 of -1 +5 This is particularly clear when this digit starts a new phrase, where it would normally be marked with phrase initial high pitch.

Phrasing is marked when there are at least two digits before the confirmed digit in the phrase, as in the 1 of -1 5 +0

There seems to be a rhythmic consideration here. It appears that the speaker de-accents the phrase-initial digit to avoid putting strong phrases too close to each together.

Audio Audio Q0005.gif Q0013.gif

Bell Labs Innovations Logo [ Papers | Top | Stem-ML modeling ] Greg Kochanski: [ Home ]
Chilin Shih: [ Home ]