Greg Kochanski |
The source-filter model of speech production is a simple and valuable approximation in humans. It describes human speech as a signal, beginning with the vibration of the larynx, which is filtered as it passes up through the vocal tract. As the tongue and lips move, the filter changes, and emphasizes different frequencies, thus giving different vowel sounds.
However, applying the source-filter model is conceptually unsatisfactory, because one cannot uniquely reconstruct both the source signal and filter from a single audio signal. There simply isn't enough data in one audio signal to unambiguously produce two outputs.
Useful algorithms do exist, such as inverse-filter algorithms, but they make assumptions and will fail when the speech's fundamental frequency gets close to the first formant. Automatic methods can also fail if the formant structure is not clear enough to track, or when the acoustic signal is nonstationary.
I have developed an algorithm for reconstructing an estimate of the source signal from sound propagated through the throat, using an array of external microphones. [ `A Quasi-Glottogram signal,' Kochanski, G. P. and Shih, Chilin] This signal can, in principle, provide a direct view of the source, without the complexities of the time-varying filter that is provided by the vocal tract.
Collaborations[ Papers | kochanski.org | Phonetics Lab | Oxford ] | Last Modified Tue Mar 18 17:41:08 2003 | Greg Kochanski: [ Home ] |