The Stem-ML model is built by placing tags on words, with adjustable parameters defining the tag shapes and positions (details below). For the confirmatory question experiment, we built a model with 48 free parameters to describe the accents, their position relative to syllables, and their strengths. These 48 parameters completely determine the intonation, once the accented word and syllable durations are known. This corresponds to 1.12 parameters per utterance, or about 1 parameter for each 10 syllables. The model is therefore very compact and makes strong predictions.
The algorithm obtains the parameters's values by minimizing the RMS frequency difference between the data and the model. Unvoiced regions were excluded.
We used a Levenberg-Marquardt algorithm [9,10] with numerical differentiation to find the parameters that give the best fit. The algorithm requires about 10 steps before the RMS error and parameters stabilize.
Levenberg-Marquardt, like many data fitting algorithms, can become trapped in a local minimum of , and may miss the global best-fit. We have noticed that problem with the fits we have here: at least two local minima exist. If there are a small number of local minima (which there seem to be for this model of this corpus), one can run multiple fits, starting from different sets of initial parameters, and then compare the values and final parameters of the different fitting runs.
Greg Kochanski, Chilin Shih 2002-08-03