In Stem-ML, a ``tag'' is a tone template, along with a few
parameters that describe the scope of the template and how the
template interacts with its environment. It corresponds to the
mathematical description of an intonation event (e.g., a
tone or an accent). Variables in the equations below are defined in
Table 2.2.3. Tags have
a parameter, type, which controls whether errors in the
shape or average value of the pitch curve are most important. In
this work, the targets, y, consist of an tone component riding on
top of the phrase curve, p.
In order to efficiently solve the optimization problem, and
calculate the surface realization of prosody, we write simple
approximations to G and
R so that the model can be solved
efficiently as a set of linear equations:
|G = + (
. smooth/2)2 + adroop2 .
|R = sk2rk
. /2)(et -
yk, t)2 + sin(type .
/2)( - )2,
Finally, f0 is
e, scaled to the speaker's pitch
| = g(e,) . range +
so that p and e are dimensionless quantities, typically
between 0 and 1. The function g()
handles linear (add = 1) or
Fujisaki (add = 0) scaling:
g(e, 1) = e for any
e, and also
g(0,) = 0
g(1,) = 1
for any add.
Table: Definitions of parameters and
variables used in this paper. Daggers denote parameters defined
more fully in .
||Controls the mapping between e and f0. See g().
||Rate at which e
droops toward the phrase curve in the absence of a tag.
||The speaker's relaxed f0.
||Response time of muscles.
||Is tone defined by its shape (0) or f0 value (1).
||Controls how the amplitude of the template depends
on the strength of a word.
||Emphasis, i.e., relative to the
||Eqs. 3, 4
||Mean emphasis over the scope of a tag.
||Eqs. 3, 5
||Mean value of a tone template.
||Effort expended in realizing the pitch
||The summed error for word i between the template and the realized
||The summed error for an utterance between the
ideal templates and the realized pitch contour.
||Function to map between subjective emphasis
(e) and objective f0.
Greg Kochanski, Chilin Shih 2002-08-03