|
|
|
|
|
Greg Kochanski (Oxford Phonetics) |
|
Chilin Shih (University of Illinois) |
|
Tan Lee (CUHK) |
|
Hongyan Jing (IBM) |
|
Jiahong Yuan (Cornell) |
|
|
|
|
Is it phonetics or phonology or physiology? |
|
|
|
How to build a mathematical model? |
|
|
|
How *do* tone languages implement prosody? |
|
|
|
Can we objectively assign an importance to a
syllable? |
|
|
|
How simple might English intonational phonology
be? |
|
|
|
|
|
Explain intonation in a way that is: |
|
Consistent with the most basic linguistic
assumptions |
|
Falsifiable |
|
Reductionist |
|
Consistent with known Physiology, Biology and
Physics. |
|
|
|
|
|
|
|
|
|
|
|
|
People plan their utterances several syllables
in advance. |
|
People produce speech optimized to meet their
needs. |
|
A realistic model for the muscles that control f0 |
|
|
|
|
|
|
|
|
People want to minimize the chance that they
will be misunderstood. |
|
Risk = P(misinterpreted) * cost(misinterpreted) |
|
|
|
People want to minimize effort and/or talk
faster |
|
Chairs, Cars |
|
|
|
How to combine the two? |
|
A weighted sum. |
|
We allow each syllable to have a different
weight |
|
Perhaps weight matches importance. |
|
|
|
|
|
|
|
For s>>1, Error (R) dominates, and pitch
matches target. |
|
|
|
For s<<1, Effort (G) dominates, both
speaker and listener accept large deviations, and pitch smoothly
interpolates. |
|
|
|
For s~1, everything compromises. |
|
|
|
|
|
A: What is 2 meters + 3 kilograms ? |
|
“Effort” can have energy units. |
|
“Error” can be a pure number (error
probability). |
|
A multiplier is needed to make the units agree. |
|
|
|
A: Strength = cost of a misinterpretation |
|
|
|
|
Intonation (pitch) is one of the more important
components of prosody. |
|
|
|
Also duration, loudness, facial expressions. |
|
|
|
|
|
|
|
A model is a sequence of targets. |
|
Each target has a strength. |
|
One target per tone. |
|
Targets are stretched to fit syllable duration. |
|
Only one phonological rule: 33®23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Intonation is represented as: |
|
a small
set of discrete symbols, in sequence, |
|
modulated by a variable prosodic strength, with |
|
a per-person or per-style shape for each symbol |
|
|
|
One symbol per syllable seems enough |
|
|
|
The basic mechanisms could be common across all
languages. |
|
|
|
The strength parameter seems real |
|
Similar across languages |
|
Matches language structure |
|
|
|
|
|
|
|
|
|
|
|
|