We use Stem-ML to build an automatic learning system for Mandarin prosody
that allows us to make quantitative measurements of prosodic strengths. Stem-ML
is a phenomenological model of the muscle dynamics and planning process that
controls the tension of the vocal folds. Because Stem-ML describes the interactions
between nearby tones or accents, we were able to use a highly constrained
model with only one accent template for each lexical tone category, and a
single prosodic strength per word. The model accurately reproduces the intonation
of the speaker, capturing 87% of the variance of the speech's fundamental
frequency, f0. The result reveals strong alternating
metrical patterns in words, and suggests that the speaker uses word strength
to mark a hierarchy of sentence, clause, phrase, and word boundaries.
Keywords prosody, dynamics, modeling, intonation, tone, Mandarin Chinese