Prosody Logo Greg Kochanski Prosody Logo

8 October 2008 - edited transcript. Audio can be found at http://media.podcasts.ox.ac.uk/oucs/oxonian_interviews/kochanski_interview.mp3?CAMEFROM=podcastsRSS . Also see http://podcasts.ox.ac.uk/ .

Interviewer: The study of speech and language is a complicated area. Dr. Greg Kochanski, a research fellow at the Oxford University Phonetics Laboratory, talks about how experiments in phonetics are conducted and how speech changes over time.
When you do these experiments, are you finding that you're studying speech in the environment or speech in the lab? People talk differently when they know that they're being recorded.
Greg Kochanski: They do and they don't, yes. We've actually done some experiments to find out if they talk the same in the experiments as they do in more normal circumstances, and you can find some differences. It's certainly true that if you put a person in a more formal situation, they'll talk a bit differently than if they're talking informally to friends and whatnot.
In formal speech, you really make use of the fact that the person who's listening to you understands you well. It's more abbreviated, more compressed, less precise typically. But if you more or less know what the changes are (between formal and informal speech), it's not a big issue, because a lot of things we say don't change or at least a lot of aspects don't change. Your basic pronunciation, for instance, is more or less the same. These differences are something to be aware of for sure, but if experimental speech were horribly different from normal speech, we wouldn't really understand each other in different situations.
Interviewer: How big is one of these experiments?
Greg Kochanski: Well, one way to look at it is that the size of the experiments are set by funding issues and how much money you can get from research councils. Some experiments need to be huge and if you can't get funding for them, then you just can't do them. So, in practice, a typical experiment might have 10 or 20 people in it talking for an hour or two each. That ends up being quite a lot of data actually.
Up until recently, you were always worried about disk space for this sort of experiment. Sometimes, we have 700 Gigabytes of data and intermediate results and computations and whatnot floating around. Certainly, you can imagine speech experiments and data analyses that would tax at even modern computers. But it's not big science in the sense of particle physics or astronomy where they have whole-sky surveys and things like that. It tends to be bigger science than psychology because you can't as easily break linguistics down into a little experiment where you just testing the single hypothesis. It's hard to break it down into very simple yes/no kinds of questions.
Interviewer: And that's because you can't really decide what the questions are?
Greg Kochanski: Well, it's partly because we don't really know what the questions are in linguistics, but it is also partly because language is an integrated thing. If I say a sentence, the way I say it is going to be dependent on the meaning. It's going to be dependent on the context. It's going to be dependent on who I'm talking to. So, there's a lot of complexity. You can't really learn anything interesting by looking at a single word in isolation. You're always looking at interactions between things, differences between things.
For instance, language evolution ties into this. One of the differences -- the big difference between say American and British English comes from what they call the Great Vowel Shift, which happened in 1300s through the 1700s. Scottish English is pre-vowel shift, American English is partially pre-vowel Shift and southern British English is post-Vowel Shift. What happened is that the "a" sound where the "a" in words like "bath" (American pronunciation as in "tag") turned into "bath" (southern British) and that triggered the whole sequence of vowel shifts sort of going around a loop.
Basically, the southern British vowels (after the shift) are done with the tongue a little bit higher in the mouth. So "bath" (American) is done with a mouth wide open, tongue down and "bath" (southern British) is done with the tongue rather higher up. The reason this whole chain of shifts happen is because when you push one vowel up, it gets too close to the next vowel, so words get confused. You'll be confusing words because now the raised "a" sound is going to be too close to some other sounds. Those words sort of have to get out of the way, so they get pushed into some different pronunciation. And that, of course, creates another set of confusions.
If you like, one initial shift can propagate across the whole language changing pronunciation of half the words or quarter of the words or something like that, some thousands of words in the language. It eventually stabilized, and now you have two dialects. It's not the kind of phenomenon you can treat with a local view of just a few words. To explain that kind of thing, really, you have to deal with the language as a whole, so a lot of experiments you'd like to do on a very large scale, dealing with the whole language. Of course, you can't, for practical reasons.
But if even the smallest experiments are complex and messy because they have to deal with interactions and catch a lot of context and connect to a lot of the real world, language touches all the bits of our humanity. It's a good view. It can give you a good view of what's going on in the brain compared to many techniques. You can compare it to fMRI (functional Magnetic Resonance Imaging) which can show you which areas of the brain are active -- that's a beautiful technique. Certainly, fMRI tells you things you can't learn by listening to a person. But on the other hand, if you listen to a person, you certainly learn some things that are going on in their brain that don't show up as hot-spots on fMRI. (At least the things the subjects want you to know about.)
Language has evolved as a human mechanism for -- well, for many things, but one of them is letting your friends know what's going on inside. I mean, the other things that language is used for are power games and pushing people around and whatnot, but it certainly gives you a view of the inside of the brain that has value.
Interviewer: Going back to the change of language, why do languages evolve and how do we know that languages in the past have changed? How do we track history in languages?
Greg Kochanski: Well, history is one of the better ways that we know that languages have evolved. It's because people write grammar books. You find grammar books from a quite of variety of cultures. The Romans did them around 100 B.C. and by 1640s, you had Ben Johnson writing his English Grammar. People are fairly sophisticated in this field. But in fact, there's a grammar book done by a guy named P??ini in India around 400 B.C. which is remarkably modern in many respects. It talks about how to pronounce things and then what words get together on various ways, and the likes of it was not seen again until, really, the Greeks and the Romans did a few and then -- but of course, you've got Chinese rhyme dictionaries popping up fairly early also.
Language is something that people like to write about probably because it's important from a social class point of view and social interaction point of view. People realize very rapidly that if you don't speak with the group, you're considered an outsider. People don't want to be outsiders, so they worry about how to talk and give advice about how to speak.
So, for the last thousand years, you track language changes by grammar books and spelling. Spelling is -- in many languages except for English, really, spelling is pretty well connected with the way people speak. English has one of the more complicated and horrific spelling systems (letter to sound rules) of any language. But that's partly because English spelling is fixed in the 1600s, and the English language has continued to evolve. English spelling was standardized not too long after the printing press came here and hasn't changed much. So, if you read your letters fairly literally, you're speaking the language of Queen Elizabeth I and if you read them in the modern way, you can speak the language of Queen Elizabeth II.
You also get some clues from poetry and things like that. That's a complicated issue because poetic standards have changed. For instance, everyone thinks of rhymes at the end as being the standard of poetry. But in English, up until 1400 or so, it was alliterative poetry. Sounds at the beginning of the word had to match in strong positions, and the endings didn't much matter. Gawain and the Greene Knight is very much that way and a few others. So, poetry can give you clues, but it's complicated because you have to understand the rules of the poetry of the culture.
Oh, and also you get the people writing about foreign languages. That's a fairly common thing that people have done throughout history; not always in a very informed way but sometimes, very perceptively, talking about the all the funny ways that foreigners speak. That can give you the clue about both the way the foreigners speak and that the way the writer would speak because obviously, they are not the same if it sounds funny. That kind of research - putting clues together to reconstruct ancient languages is called Historical Linguistics. Oxford is one of the few places where people do much of that anymore.
Now, that was all historical stuff. People look at changes in modern (current) languages in a variety of ways. But historical language change has been a field that's been going for quite a while and it's sort of gotten quiet if you like. Maybe it's a bit out of fashion because it's not high tech. Anyhow, Oxford still has a good group doing useful stuff in historical linguistics. It provides a lot of important data for how languages have changed and evolved, and that's one of the important questions in linguistics, even though it's not one we have all the answers to, because there's a lot of complicated issues in it.
Interviewer: Well, on the very basic end, why are there differences between different languages and today? Do we all have a common root or are there are several common roots?
Greg Kochanski: Well, common root, well, no one knows. We know all the Indo-European languages probably have a common root. Indo-European languages basically come from the Caucuses, and are spoken in most of Europe and well, India.
The trouble is, you can trace back the connections between languages only so far before they get lost in the noise. Languages change enough over a few thousand years that the relationships between them become unclear. Languages are always borrowing from other languages and things like that. But back in the 1800s, people realized that there strong connections between English and German, and English and French, and French and Latin, and French and Italian. All these languages have many equivalent pairs of words like "father" versus "pater".
There are a whole bunch of similarities which you could pull together with a few simple rules to show that here is a set of words common to all the languages, but the languages have diverged by changing a few rules about how you pronounce things. With simple rules about sound change, you can tie together quite a lot of similarities, and you can trace similarities in the grammar and stuff too. But it is pretty clear that you can only look back maybe four thousand years that way, and it gets pretty fuzzy at the far end.
So, languages that have split off less recently than that, you just can't say how they're connected. We have no idea for instance what, if any, connection there is between Japanese or Mandarin-Chinese and English. There are similarities, differences and we're just lost in the midst of time.
Interviewer: Is it feasible to assume that there was a connection at some stage?
Greg Kochanski: Well, people have worked done the question of when human language evolved and it's pretty clear it evolved a lot more than 4,000 years ago. People are talking about sort of 100,000 years ago time scales. So, it's easy to imagine lots of history going on that we don't know about. For a lot of human history, people were fairly separated, each in their own little village, with not much commerce from village to village, and that is a situation that grows languages. You can still see the way it could works, even today, in places like New Guinea. Now, the highlands of New Guinea are the home of almost half of the world's languages, and in that little small geographic area and that's because basically, there were no roads, very rough terrains, so people hardly went beyond their neighboring villages.
Over time, small changes in the language accumulate differently in each village until there are enough so that the dialects in the two villages become mutually unintelligible, and then you have a new language. As long as the two dialects are mutually intelligible, there is a sort of glue holding them together. But once it becomes too hard to figure what the other person is saying, contact and commerce drop off, and there's no reason to keep the languages the same at all. They just drift in different directions. (There are so many directions that you can change a language, there's no reason for two dialects to drift the same way.) So, that kind of thing could have happened for millennia.
People are trying to do mathematical models of language evolution and have had some success. But so far, the modeling that they've been doing has been basically analogous to biological genetics and that's not more than part of it. Language has not only evolved just like species. In biological evolution, you get your genes from your parents and they got theirs from their parents and whatnot so you can track a tree of ancestors. But in language, you borrow. If someone comes and conquers you, they'll leave a bunch of words. Or, if you go trading with someone and they've got some new toy or technique out there, you'll buy their toy and you may take the word they use, also.
So, language evolution is not just a case of people living in the same village and year after year with their language drifting. Some word could just sweep all of a continent because some idea reached that continent. So, it's also tied in with politics and history and invasions and commerce and technology and all kinds of things.
Interviewer: Can you talk about the relationship between Mathematics and phonetics because traditionally, Maths doesn't seem to have any association between humanities?
Greg Kochanski: Well, we kind of are humanities and kind of are science. Phonetics is the experimental end of linguistics and linguistics is a very broad field which really goes from -- on one end, people doing articulatory models with computers looking at muscles moving and aerodynamics and -- on the other end, really philosophy and cognitive science and that kind of thing. Obviously, some ends of the field are more mathematical than others. We are in the humanities in the sense that we are trying to understand how people think and explain the human condition, and language is a big chunk of the human condition. Language is the glue that keeps us from being locked into our own little heads. But on the other hand, there are some corners of linguistics that we understand well enough to start applying the techniques of science to the questions that we do understand, and those ends end up being fairly mathematical.
We're just finishing a research project where we're looking at tongue position, tongue motion with an MRI machine and we're taking that data and we're trying to test linguistic theories. Chomsky based his theories on things called "features", which are little instructions to the tongue, like "high" and "low". People have built on that, talking about feature spreading, and we've turned that kind of broad theoretical description into a set of mathematical models. If you want to capture most of the important possibilities that linguists have written about, you need more than 200 mathematical models. We want to make these models, then test them on the data to see which one works. By doing this, we can eliminate a lot of possibilities as not being a good explanation of the data; this is a state-of-the-art approach to linguistics that hasn't really caught on yet everywhere. We're hoping it will; it should. It's a scientific approach to linguistics if you like. This kind of approach is only possible now because we have enough computer power that (a) we can do the image processing, we can figure where the tongue is, and (b) we can evaluate the mathematical models of where the tongue ought to be, and not just one, either.
Language is complex enough so that it's hard or impractical to do a precise theory. So, since theorists aren't superhuman, you end up with fairly vague theories that don't give details. If you want to try to make them precise enough to test, you have to put a lot of options. For instance, one of the models we're looking at has features like "high" and "low" to say where the tongue should go. But, not every sound specifies the tongue position in detail. Some sounds don't specify some features. If a feature is unspecified, what do you do? Where do you put your tongue? Well, some theorists say that you take the feature to the left (the feature that hasn't happened yet) of the one that you're doing. Other theorists say that you hold on to the past: if the tongue was high, you keep it high until you need to move it somewhere else. So, there are two options on how you fill these empty features to control your tongue. You combine that pair of options with a few more and you'll end for four or six possible models to test, then you combine that with something else, you end up with 12 or 24. The number of detailed models you have just expands immensely because you have a whole bunch of choices to make, which aren't really specified by the linguistic theories. But you have to specify them in order to turn it into a concrete model that you can test. So we're doing it using a little brute force; we're testing them all.
That's really the difference between a lot of linguistics and the rest of science. Science more or less operates following the rules that were first written down by Karl Popper, which is that you come up with a hypothesis and you make predictions with the hypothesis and you test it. If it works, good; you can try testing it again. If it doesn't work, you throw away the hypothesis and you go off do something else. It's a very evolutionary approach. It's a competition between ideas. If the idea makes a prediction and the prediction works, it's good. If it doesn't make a prediction, it's not very useful. If it doesn't work, it's junk. Popper's recipe works if you have theories that make specific predictions. The trouble with a lot of linguistics is, it's complex and messy and squishy subject, and even if you have a theoretical description of things, it's hard to translate that into predictions, and so it's hard to test models. Because you can't test and eliminate models, a lot of different theoretical predictions co-exist and you don't have a high-level of competition between ideas.
Part of the difference between linguistics and the hard sciences is very much of a cultural thing. In physics, if somebody says "this is my model," he or she is implicitly saying that "this is the way the universe works" and everybody else's model is wrong. Everyone believes that the universe does things in only one way, so there can be only one correct model that describes how it works. In linguistics, instead you say "this is my view of things." Or, "I can describe things this way," but you're not implying that every other view is wrong. Someone else can have their own viewpoint or description of whatever is going on. So, different linguistic viewpoints co-exist. They're not really considered to be in conflict because in fact, it's relatively hard to translate them into specific predictions to find if they are actually in conflict or not.
Different fields use different metaphors: in the hard sciences, you describe the mechanism that causes something to happen. In Linguistics, you separate your viewpoint from the phenomenon, and you say "this is what it looks like from my hilltop" with the understanding that it might look different from another hilltop. Because of this, linguistics has lots of viewpoints, so we have lots of different models to test.
So, anyhow, we have found some cases where you can translate linguistic theories or viewpoints into a collection of models that are detailed enough to test. Then we can test the collection of models and find which ones fly and which ones fail. Sometimes, you find that one theoretical viewpoint leads to models which more or less universally fail and then you can throw it out. That's the hope for progress in the field.
This approach won't always work because sometimes, a linguistic viewpoint will give you some successes and some failures and then life gets complicated and messy, but that's not always going to be the case. When you do get a fairly clear cut answer, then you've learned something real, learned something that you can't do just by sitting there theorize things and thinking about language.
Interviewer: Is it possibly to predict what changes will occur in language in the future?
Greg Kochanski: In practice?
Interviewer: Uh-huh.
Greg Kochanski: Certainly not at our current level of understanding. We know a little bit about how languages change. But the understanding we have is more descriptive than predictive. It centers around the overlap of things, the need to be clear and the need to not confuse people you're talking to. So, we understand the process where something moves and it bumps into something else and pushes it. But why things move is driven by a lot of factors including who's got social status and a lot of fashion things and cultural things at every moment.
People speak a bunch of different dialects in the U.K. and if the status of one group rises, by and large, the language will tend to shift in their direction, or at least some aspects of language will shift in their direction. But, even if language precisely follows the money and status, you can't predict which way the language will move unless you know who's going to be the top dog.
Even then, television shows can make certain bits of language important, interesting, nifty, and. Well, there was a sweet little fad that -- 30 years ago, there was a pet rock fad in the U.S. where you could buy for Christmas a little plastic box with a window in it and a rock. It was sort of a set up as a terrarium so you can have your own pet rock. For God knows what reasons, people would pay real money for these things and it was just a fad. It's a kind of thing that seems like a good idea at the time but a few years later, it may seem less of a good idea. But, anyhow, words can work like that too. For instance, people say "Not" after sentences. That started happening a few years ago, and it may or may not last. But, I can't imagine coming up with a prediction of either one of those fads in advance.
Interviewer: What about the pace of language evolution there, is it still changing at quite a rapid pace?
Greg Kochanski: Yes. There were predictions back 50 or 100 years ago that we would all speak the same language and speak the same dialect because of television and audio recording. In fact, people predicted that language evolution would stop because of tape recorders or maybe it was even wire recorders, because we would be able to hear the way the previous generations spoke and we would want to speak that way. Well, these days, we can easily hear the way that previous generations speak, but we don't actually have any particular desire to speak exactly the same way. That may be a teenage thing.
Interestingly enough, dialects haven't gone away. Dialects seem as strong now as they were (maybe not quite as strong) but dialects are doing very well in the U.K. and in the U.S. I think one of the factors that wasn't expected was that people make a distinction between the languages they understand and the languages they produce. Everyone understands the standard dialects of English, but that doesn't mean that they necessarily want to produce them. Probably it's a social thing. Probably you want to sound like your friends and if your friends use a particular dialect, well, that's what you'll produce, even though you understand six or eight other varieties of English quite well. Or, maybe it's an identity thing: you think of yourself as a Scot, and don't want to pretend to be something you're not. Or, maybe it's a fear-of-embarrassment thing: you don't want to do a bad job of some new accent, so you never speak it, so it never gets better.
Interviewer: I had a general question about when we have ideas in the mind, they can be expressed, involved in starting from an idea and putting it into speech?
Greg Kochanski: Well, quite a lot. There is certainly not a simple connection between how we think and language. From a neurological point of view, there are a lot of steps following motion planning and also before that, when you construct the sentence and you have to worry about what words you're going to use.
Interviewer: What about speech in noisy surroundings? How do we fill in the gaps when we haven't heard all of someone's speech?
Greg Kochanski: Yes. Well, language has a lot of redundancy bothf acoustically and also in the syntax. For instance, if you look at Latin, you have a bunch of cases (The cases are nominative, genitive, dative, accusitive, ablative, and vocative.). Or, if you look at French, every noun is either masculine or feminine. If you look at English, you have plural carried through from the noun to the verbs. Many things in the sentence needs to agree on whether it is singular or plural, and for languages that have case and gender, sentences need agreement on cases, and agreement on gender.
All those properties are really a way to insert some redundancy into the sentence. This redundancy allows error correction and error detection by the listener. So, if I say something that starts plural and it doesn't end plural, you immediately know that you misheard something or I misspoke something or something has gone wrong. And likewise, in Latin, if you messed up the case agreement, the listener knows that something has gone wrong, and can go back and ask for corrections or he/she can at least realize that "I misunderstood something: I need to fill it in from context." (Of course, even with this mechanism to catch some errors, sometimes we just get confused, sometimes we don't understand what the other person is saying for a variety of reasons.)
But this redundancy also occurs at the acoustic level. There are some good experiments showing that the way you pronounce one sound affects the pronunciation of the neighboring sounds, sometimes even more than a syllable away, so that -- for instance, this is especially relevant if someone's hammering, right? I mean hammer blows completely destroy the particular sound underneath the hammering, but you have a lot of information from the neighboring sounds. From the sounds around the hammer blow, you can reconstruct the missing sound just by thinking this way: That sounded as if his mouth was wide open at the end of the syllable. So presumably, his mouth was wide open in the middle of the syllable, so I can figure out what the vowel was even though I didn't actually hear it. You can do all this because the vowel changes the consonants near it.
Interviewer: And that's an automatic process?
Greg Kochanski: And that's an automatic process, yes. The experiments on this are done by replacing bits of speech with white noise (instead of hammer blows). You can replace a remarkable amount of speech with white noise and still extract meaning from it.
I did this experiment once where we where trying to compare the performance of automated speech recognition systems with human performance. The idea would be you take a five-digit number and replace increasing amounts of it with white noise. So, it would be like -- well, I can't really do it, but it -- two, one, three, three, four, one, five with bursts of hiss -- and if you left little gaps in the white noise, you could actually replace 90% of the sounds with noise and people will would still get the number right half the time. You have to have fairly small little slices and the little slices where you can hear bits of the digit have to be close enough together so they give you a little bit of a clue to more or less each digit. It's a very freaky experiment because you listen to this and you are absolutely convinced you're just guessing. You know you can't possibly understand this hissy mess. It's just basically, it's just [speaker made a sound] but then you guess, and you find that half the time, you got this five-digit number right.
That's a bit surprising, because there are a 100,000 different five digit numbers, and half of the time you pick the right one out of those 100,000 choices, even though you think you're just guessing.
I tried it myself because I didn't believe the results and it's really, psychologically, a very interesting process. You don't even know you're gluing the pieces back together and making a guess, but it works quite remarkably well, and that's a case where it's just the acoustic context. Well, not quite -- because you know that each sound can only be one of a few possible number names. So if you hear a [speaker made a "v" sound], you know it probably comes from a seven and there's a seven around there. If you didn't hear a [speaker made a fricative sound] sound, that's probably a nine or a one, I'd guess.
Small clues can give you a lot of information in a restricted environment like that but yes, that happens automatically, you don't even know it's happening.
Interviewer: It's quite remarkable.
Greg Kochanski: Yes. There's a lot of stuff that goes on up in the brains. That one is not really aware of, that's why we do experiments because you can't realize these things just by thinking about them.

[ Papers | kochanski.org | Phonetics Lab | Oxford ] Last Modified Thu Jul 4 00:07:50 2013 Greg Kochanski: [ ]