Notes
Outline
Chinese Intonation:
Connecting Linguistics to Acoustics
Greg Kochanski (Oxford Phonetics)
Chilin Shih (University of Illinois)
Tan Lee (CUHK)
Hongyan Jing (IBM)
Jiahong Yuan (Cornell)
Questions
Is it phonetics or phonology or physiology?
How to build a mathematical model?
How *do* tone languages implement prosody?
Can we objectively assign an importance to a syllable?
How simple might English intonational phonology be?
What is the goal?
Explain intonation in a way that is:
Consistent with the most basic linguistic assumptions
Falsifiable
Reductionist
Consistent with known Physiology, Biology and Physics.
Existing work
But F0 bumps don’t match accents…
Existing work
Existing work
The
Challenge
Another Challenge
Basic assumptions used in modeling
People plan their utterances several syllables in advance.
People produce speech optimized to meet their needs.
A realistic model for the muscles that control f0
Speech is planned.
People talk nearly as fast as possible.
Speech could be optimal
Optimize what?
People want to minimize the chance that they will be misunderstood.
Risk = P(misinterpreted) * cost(misinterpreted)
People want to minimize effort and/or talk faster
Chairs, Cars
How to combine the two?
A weighted sum.
We allow each syllable to have a different weight
Perhaps weight matches importance.
Modeling math
“Effort”
Modeling math
Model behavior
For s>>1, Error (R) dominates, and pitch matches target.
For s<<1, Effort (G) dominates, both speaker and listener accept large deviations, and pitch smoothly interpolates.
For s~1, everything compromises.
Q:Where did this “strength” come from?
A: What is 2 meters + 3 kilograms ?
“Effort” can have energy units.
“Error” can be a pure number (error probability).
A multiplier is needed to make the units agree.
A: Strength = cost of a misinterpretation
Physical implementations of prosody
Intonation (pitch) is one of the more important components of prosody.
Also duration, loudness, facial expressions.
Modeling math
The rest of the model.
A model is a sequence of targets.
Each target has a strength.
One target per tone.
Targets are stretched to fit syllable duration.
Only one phonological rule: 33®23
Model fits to Mandarin Chinese
What’s the procedure?
Model fits for Mandarin Chinese
Strengths are stable under small changes in the model.
Model parameters
Model parameters
Model parameters
Metrical patterns inside words
Other nice properties
Local Conclusion
Intonation is represented as:
 a small set of discrete symbols, in sequence,
modulated by a variable prosodic strength, with
a per-person or per-style shape for each symbol
One symbol per syllable seems enough
The basic mechanisms could be common across all languages.
The strength parameter seems real
Similar across languages
Matches language structure
But does it work for English?
English
The model for English
Model details
Model fits well over a range of speeds.
More fits - English confirming questions.
Local conclusion
Why is the model so compact?
Similar Effects in other Articulators?
Conclusion
Conclusion