Connecting Linguistics to Acoustics
Greg Kochanski (Oxford Phonetics)
Chilin Shih (University of Illinois)
Tan Lee (CUHK)
Hongyan Jing (IBM)
Jiahong Yuan (Cornell)
Is it phonetics or phonology or physiology?
How to build a mathematical model?
How *do* tone languages implement prosody?
Can we objectively assign an importance to a syllable?
How simple might English intonational phonology be?
What is the goal?
Explain intonation in a way that is:
Consistent with the most basic linguistic assumptions
Consistent with known Physiology, Biology and Physics.
bumps don’t match accents…
Basic assumptions used in modeling
People plan their utterances several syllables in advance.
People produce speech optimized to meet their needs.
A realistic model for the muscles that control f
Speech is planned.
People talk nearly as fast as possible.
Speech could be optimal
People want to minimize the chance that they will be misunderstood.
Risk = P(misinterpreted) * cost(misinterpreted)
People want to minimize effort and/or talk faster
How to combine the two?
A weighted sum.
We allow each syllable to have a different weight
Perhaps weight matches importance.
For s>>1, Error (R) dominates, and pitch matches target.
For s<<1, Effort (G) dominates, both speaker and listener accept large deviations, and pitch smoothly interpolates.
For s~1, everything compromises.
Q:Where did this “strength” come from?
A: What is 2 meters + 3 kilograms ?
“Effort” can have energy units.
“Error” can be a pure number (error probability).
A multiplier is needed to make the units agree.
A: Strength = cost of a misinterpretation
Physical implementations of prosody
Intonation (pitch) is one of the more important components of prosody.
Also duration, loudness, facial expressions.
The rest of the model.
A model is a sequence of targets.
Each target has a strength.
One target per tone.
Targets are stretched to fit syllable duration.
Only one phonological rule: 33
Model fits to Mandarin Chinese
What’s the procedure?
Model fits for Mandarin Chinese
Strengths are stable under small changes in the model.
Metrical patterns inside words
Other nice properties
Intonation is represented as:
a small set of discrete symbols, in sequence,
modulated by a variable prosodic strength, with
a per-person or per-style shape for each symbol
One symbol per syllable seems enough
The basic mechanisms could be common across all languages.
The strength parameter seems real
Similar across languages
Matches language structure
But does it work for English?
The model for English
Model fits well over a range of speeds.
More fits - English confirming questions.
Why is the model so compact?
Similar Effects in other Articulators?