Prosody Logo Prosody and Prosodic Models
ICSLP 2002 - September 16, 2002, Denver Colorado
Greg Kochanski and Chilin Shih
Lucent Logo

We know that all aspects of prosody are controlled by muscle actions, and that the mapping between muscle activation and perceived prosody is not strongly nonlinear.

Neglecting segmental effects, the factors that influence the pitch are the vocal fold tension (Ohala and Ladefoged, 1970) and subglottal pressure (Monsen et al., 1978). The vocal fold tension and subglottal pressure are both smoothly changing functions of time, controlled by nerve impulses, Newtonian mechanics, and the viscoelasticity of tissue. The overall relationship between muscle activation and pitch is smooth, nearly linear, and the effects of the different muscles can be combined into smooth frequency changes.

Detailed physiological models for f0 are described in Titze (1993a) and references therein. Also see the discussion of the "Cover model" in Titze (1993b) for an example of how activity of the Thyroarytenoid and Cricothyroid muscles combine. Similar calculations involving the lung pressure also show a smooth dependence that is not strongly nonlinear.

The above means that we can assume that the prosodic trajectory is continuous and smooth over short time scales. Thus there are smooth and predictable connections between neighboring accents, because muscles simply cannot discontinuously change position. The muscles that control the larynx cannot respond faster than 100 ms (Stevens, 1998, pp. 40-48 and references therein; Xu and Sun, 2000), a time that is only slightly shorter than a typical syllable, so we expect the intonation of neighboring syllables to interact. This interaction should be important in all languages.

A careful introduction of physiological constraints on the models can help text-to-speech systems sound more like a real human. Öhman (1967) and Fujisaki (1983) were instrumental in incorporating physiological constraints in pitch generation. Xu et al. (1999) is a more recent work providing a quantitative model for Chinese tones. Some related work in articulatory modeling includes Browman and Goldstein (1990), Keating (1990), Moon and Lindblom (1994), Fujimura (2000), and is reviewed in Perrier, Ostry and Laboissière (1996) and commentaries in Abry (1998).


Bell Labs Innovations Logo [ Papers | Top | Stem-ML modeling ] Greg Kochanski: [ Home ]
Chilin Shih: [ Home ]