Prosody Logo Greg Kochanski Prosody Logo

Experiments in Speech Science

People tend to think of speech technology (i.e. building speech synthesizers and speech recognition systems) as very closely related to speech science. But that's not entirely so. Speech technology is the engineering side of the pair and speech science is the science. (Linguistics is not necessarily either of these: it has elements of science in it, but spreads widely to cover things and people who are definitely doing neither science nor engineering.)

The difference between the science and the engineering viewpoints shows up in the conclusions of papers, but also in experimental techniques. In the conclusions, engineering papers tend to talk about methods and systems and how well they perform; science papers tend to talk about how people behave and the speech they produce.

And, the experiments differ, also. In speech technology, it is entirely reasonable to carefully select the data that will be used to build the system. Back when I was building speech synthesis systems, that is what we often did. And it makes sense for that purpose. After all, it's not important that the TTS system exactly reflect the speech of a particular individual; it is much more important that the resulting speech is clean and easily understandable. So, cleaning "bad" utterances out of the speech data means that the resulting synthesizer has no odd and embarrassing outputs.

Of course, it means that the resulting system reflects some of the tastes of the builder, but there is nothing wrong with that. Engineering projects always reveal something about the engineer. And, there is not much wrong if selecting data causes the system to reflect some of the beliefs of the builder (so long as the beliefs are not too far from reality). There is art in engineering, along with the rules and math.

However, in speech science, building a good system is not the goal. On the science side, we're in the game of finding out how the average person speaks, so we can make our own beliefs close to reality. (Then publish, of course!) Given that goal, it would obviously be circular if we let our own beliefs influence the experiment. And, since we are not really average people in this respect (for instance, we pay much more attention to speech details than most people), we don't want to study ourselves, even inadvertently.

Of course, it would be a terrible waste to impose ourselves on the data. But, it's all too easy to do it by accident.

First of all, the experimental subjects are as smart as us researchers. Especially if one uses university students (which is easy and typical), the subjects are the cream of Britain's youngsters. So, they are quite capable of thinking while doing an experiment and (if we are not careful) figuring out what is going on.

There are a set of horrific experiments by Stanley Milgram in the 1960s, where he induced experimental subjects to "electrocute" people, just by telling them to. Experimental subjects are all too willing to do what they think the experimenter wants. This willingness is the underlying basis for ethical reviews of research - experimental subjects need some external protection because they are generally willing to defer to and assist the (metaphorical) guy in the (metaphorical) white coat.

In the context of speech science, Niedzielski (Nancy Niedzzielski, 1999, The effect of social information on the perception of sociolinguistic variables, in J. of Language and Social Psychology, 18, 1:62-85.) and Hay, Nolan and Drager (J. Hay, A. Nolan and K. Drager, From "Fush" to "Feesh": Exemplar Priming in Speech Perception, to appear in The Linguistic Review; preprint at http://www.ling.canterbury.ac.nz/jen/documents/Hay-Nolan-Drager.pdf ) have shown that subjects will use small clues to guess what answer they "should" produce.

So, if we are looking for differences between pairs of words, we absolutely must not give the subjects any clues as to what we want. If they decide that we want the words pronounced differently, they will do so. Or vice versa. All in a sense of misguided helpfulness.

If you search the web for "Clever Hans", you'll see an entertaining example where even a horse could do that. (E.g. Thomas Sebeoke, The Clever Hans Phenomenon: Communication With Horses, Whales, Apes, and People (1981, published by the Annals of the New York Academy of Sciences), ISBN-10: 0897661133, ISBN-13: 978-0897661133.) And, undergraduates are smarter than horses. Clever Hans was a horse who -- apparently -- could do math problems, tapping the answer out with a hoof. It wasn't fraud: Hans could get the right answer even if his trainer was not around.

However, a set of clever experiments eventually figured out what was going on. If you gave a *different* math problem to Hans and to the people near him, he'd give the answer that the people expected. Probably the people would tense up as he approached the "correct" answer, and relax when he got there. Hans would sense this (perhaps by a change in expression or posture) and stop wherever the nearby people expected him to stop.

Now, the other effect that happens is self-delusion and confirmation bias. Humans (including scientists) love to see their opinions confirmed. It's all too easy to push inconvenient evidence away, so one must not give oneself the opportunity.

And, I speak from personal experience. I'm swayed by my unspoken assumptions too. Back in my Bachelor's thesis, I was measuring some reaction rates of atomic hydrogen at low temperatures. All the theorists at the time expected the reaction rate to increase as the density squared. (Everyone thought the important reaction was between pairs of hydrogen atoms.) Theorists are story-tellers. It's their job to provide convincing stories, and they do it well. Sometimes, they are even too convincing.

I measured that reaction rate, along with some other things. However the slope of the reaction rate that I got was wrong. Too steep. While I had the sense not to fudge my data, I plotted the data in such a way that it was essentially impossible to tell that I had the wrong density dependence. (And, drew no attention to it.)

Move forward about 4 years, and an important theoretical paper comes out. While everyone knew that there was also an term with a x-cubed dependence (which arises from simultaneous collisions of three hydrogen atoms), everyone assumed it was tiny compared to the x-squared term. Nope. Some clever Russian did the computation and discovered that the x-cubed term was far bigger than everyone had expected. In fact, in the regime where I did my Bachelor's thesis, it was the dominant term.

When I read that paper, I (correctly) felt like an idiot. If I had not let my beliefs get in the way of the data, I could have scooped the guy by 3 years. I'd measured what he predicted, but as it didn't agree with the story that I thought was correct, I ignored it. Never again. We experimenters are really here to make life difficult for the theorists, either by proving them wrong or by providing them with data that will be a challenge to explain.

So, experimental techniques need to be designed to give as little control as possible to the experimenter. Once the experiment is designed, you should have very few choices to make. You probably need some freedom to make choices, but you should be able to write rules to describe them.

For instance, in speech experiments, you are dealing with human subjects, and humans occasionally do odd things. People cough while speaking, people get tongue-tied. Sometimes, your experiment may strike a subject as absurdly funny, and he or she may start laughing. Or, a subject may become bored, and (to aleviate the boredom) may decide to speak in funny voices. In many experiments, these are good reasons to ignore some data.

But, even though you may need to allow yourself some flexibility to ignore data, the flexibility needs to be tightly restricted. The first and best option is to design your experiment so that these things don't happen. Second, design your analysis so that it is not sensitive to a small amount of strange data. (Collect more data than the minimum, use medians instead of means, or use other robust statistical techniques.)

Finally, if you need to ignore data, it must be done by a set of rules. Ideally, these rules are defined in advance, but that may not always be possible. The rules should be simple and obvious, and listed in the final published paper. Especially if the rules are defined after the data is collected, they need to be very simple and very obvious, and there must be no plausible way that the rules could transmit your theoretical biasses into the experimental results.

And, finally, the total amount of data to be ignored should be as small as possible, and always mentioned in the published paper. The chance of changing the answer by data selection (and thus fooling yourself and others) is proportional to how much data you ignore. Ideally, this would be no more than a few percent of the total.

If the experiment is such that you really need to ignore substantial amounts of data, then the only option is to treat it is part of the experiment. Sometimes it is reasonable to let the experimental subject decide whether to drop some data and try again. This can introduce biases, but at least they won't be your biases: they will be the biases of randomly chosen experimental subjects. Alternatively, you can design a two-phase experiment, where one set of subjects produce speech, and then a second set of subjects select the data.

Proper experimental design is all about making sure that you cannot fool yourself. Take a look at http://kochanski.org/gpk/teaching/0401Oxford/doubt.pdf and/or Section 7 of http://kochanski.org/gpk/papers/2005/2005BeyondF0.pdf .


[ Papers | kochanski.org | Phonetics Lab | Oxford ] Last Modified Thu Jan 8 12:19:21 2009 Greg Kochanski: [ ]