Stem-ML makes one physically motivated assumption. It assumes that f0 is closely related to muscle tensions [12]. There must then be smooth and predictable connections between neighboring values of f0 because muscles cannot discontinuously change position. Most muscles cannot respond faster than 150 ms, a time which is comparable to the duration of a syllable, so we expect the intonation of neighboring syllables to affect each other. Because our model derives a smooth f0 contour from muscle dynamics, our model is an extension of those of [13,5,20].