|
Greg
Kochanski |
|
8 October 2008 - edited transcript.
Audio can be found at
http://media.podcasts.ox.ac.uk/oucs/oxonian_interviews/kochanski_interview.mp3?CAMEFROM=podcastsRSS
. Also see http://podcasts.ox.ac.uk/ .
Interviewer: The study of speech and language is a complicated
area. Dr. Greg Kochanski, a research fellow at the Oxford
University Phonetics Laboratory, talks about how experiments in
phonetics are conducted and how speech changes over time.
When you do these experiments, are you finding that you're
studying speech in the environment or speech in the lab? People
talk differently when they know that they're being recorded.
Greg Kochanski: They do and they don't, yes. We've actually
done some experiments to find out if they talk the same in the
experiments as they do in more normal circumstances, and you
can find some differences. It's certainly true that if you put
a person in a more formal situation, they'll talk a bit
differently than if they're talking informally to friends and
whatnot.
In formal speech, you really make use of the fact that the
person who's listening to you understands you well. It's more
abbreviated, more compressed, less precise typically. But if
you more or less know what the changes are (between formal and
informal speech), it's not a big issue, because a lot of things
we say don't change or at least a lot of aspects don't change.
Your basic pronunciation, for instance, is more or less the
same. These differences are something to be aware of for sure,
but if experimental speech were horribly different from normal
speech, we wouldn't really understand each other in different
situations.
Interviewer: How big is one of these experiments?
Greg Kochanski: Well, one way to look at it is that the size of
the experiments are set by funding issues and how much money
you can get from research councils. Some experiments need to be
huge and if you can't get funding for them, then you just can't
do them. So, in practice, a typical experiment might have 10 or
20 people in it talking for an hour or two each. That ends up
being quite a lot of data actually.
Up until recently, you were always worried about disk space for
this sort of experiment. Sometimes, we have 700 Gigabytes of
data and intermediate results and computations and whatnot
floating around. Certainly, you can imagine speech experiments
and data analyses that would tax at even modern computers. But
it's not big science in the sense of particle physics or
astronomy where they have whole-sky surveys and things like
that. It tends to be bigger science than psychology because you
can't as easily break linguistics down into a little experiment
where you just testing the single hypothesis. It's hard to
break it down into very simple yes/no kinds of questions.
Interviewer: And that's because you can't really decide what
the questions are?
Greg Kochanski: Well, it's partly because we don't really know
what the questions are in linguistics, but it is also partly
because language is an integrated thing. If I say a sentence,
the way I say it is going to be dependent on the meaning. It's
going to be dependent on the context. It's going to be
dependent on who I'm talking to. So, there's a lot of
complexity. You can't really learn anything interesting by
looking at a single word in isolation. You're always looking at
interactions between things, differences between things.
For instance, language evolution ties into this. One of the
differences -- the big difference between say American and
British English comes from what they call the Great Vowel
Shift, which happened in 1300s through the 1700s. Scottish
English is pre-vowel shift, American English is partially
pre-vowel Shift and southern British English is post-Vowel
Shift. What happened is that the "a" sound where the "a" in
words like "bath" (American pronunciation as in "tag") turned
into "bath" (southern British) and that triggered the whole
sequence of vowel shifts sort of going around a loop.
Basically, the southern British vowels (after the shift) are
done with the tongue a little bit higher in the mouth. So
"bath" (American) is done with a mouth wide open, tongue down
and "bath" (southern British) is done with the tongue rather
higher up. The reason this whole chain of shifts happen is
because when you push one vowel up, it gets too close to the
next vowel, so words get confused. You'll be confusing words
because now the raised "a" sound is going to be too close to
some other sounds. Those words sort of have to get out of the
way, so they get pushed into some different pronunciation. And
that, of course, creates another set of confusions.
If you like, one initial shift can propagate across the whole
language changing pronunciation of half the words or quarter of
the words or something like that, some thousands of words in
the language. It eventually stabilized, and now you have two
dialects. It's not the kind of phenomenon you can treat with a
local view of just a few words. To explain that kind of thing,
really, you have to deal with the language as a whole, so a lot
of experiments you'd like to do on a very large scale, dealing
with the whole language. Of course, you can't, for practical
reasons.
But if even the smallest experiments are complex and messy
because they have to deal with interactions and catch a lot of
context and connect to a lot of the real world, language
touches all the bits of our humanity. It's a good view. It can
give you a good view of what's going on in the brain compared
to many techniques. You can compare it to fMRI (functional
Magnetic Resonance Imaging) which can show you which areas of
the brain are active -- that's a beautiful technique.
Certainly, fMRI tells you things you can't learn by listening
to a person. But on the other hand, if you listen to a person,
you certainly learn some things that are going on in their
brain that don't show up as hot-spots on fMRI. (At least the
things the subjects want you to know about.)
Language has evolved as a human mechanism for -- well, for many
things, but one of them is letting your friends know what's
going on inside. I mean, the other things that language is used
for are power games and pushing people around and whatnot, but
it certainly gives you a view of the inside of the brain that
has value.
Interviewer: Going back to the change of language, why do
languages evolve and how do we know that languages in the past
have changed? How do we track history in languages?
Greg Kochanski: Well, history is one of the better ways that we
know that languages have evolved. It's because people write
grammar books. You find grammar books from a quite of variety
of cultures. The Romans did them around 100 B.C. and by 1640s,
you had Ben Johnson writing his English Grammar. People are
fairly sophisticated in this field. But in fact, there's a
grammar book done by a guy named P??ini in India around 400
B.C. which is remarkably modern in many respects. It talks
about how to pronounce things and then what words get together
on various ways, and the likes of it was not seen again until,
really, the Greeks and the Romans did a few and then -- but of
course, you've got Chinese rhyme dictionaries popping up fairly
early also.
Language is something that people like to write about probably
because it's important from a social class point of view and
social interaction point of view. People realize very rapidly
that if you don't speak with the group, you're considered an
outsider. People don't want to be outsiders, so they worry
about how to talk and give advice about how to speak.
So, for the last thousand years, you track language changes by
grammar books and spelling. Spelling is -- in many languages
except for English, really, spelling is pretty well connected
with the way people speak. English has one of the more
complicated and horrific spelling systems (letter to sound
rules) of any language. But that's partly because English
spelling is fixed in the 1600s, and the English language has
continued to evolve. English spelling was standardized not too
long after the printing press came here and hasn't changed
much. So, if you read your letters fairly literally, you're
speaking the language of Queen Elizabeth I and if you read them
in the modern way, you can speak the language of Queen
Elizabeth II.
You also get some clues from poetry and things like that.
That's a complicated issue because poetic standards have
changed. For instance, everyone thinks of rhymes at the end as
being the standard of poetry. But in English, up until 1400 or
so, it was alliterative poetry. Sounds at the beginning of the
word had to match in strong positions, and the endings didn't
much matter. Gawain and the Greene Knight is very much that way
and a few others. So, poetry can give you clues, but it's
complicated because you have to understand the rules of the
poetry of the culture.
Oh, and also you get the people writing about foreign
languages. That's a fairly common thing that people have done
throughout history; not always in a very informed way but
sometimes, very perceptively, talking about the all the funny
ways that foreigners speak. That can give you the clue about
both the way the foreigners speak and that the way the writer
would speak because obviously, they are not the same if it
sounds funny. That kind of research - putting clues together to
reconstruct ancient languages is called Historical Linguistics.
Oxford is one of the few places where people do much of that
anymore.
Now, that was all historical stuff. People look at changes in
modern (current) languages in a variety of ways. But historical
language change has been a field that's been going for quite a
while and it's sort of gotten quiet if you like. Maybe it's a
bit out of fashion because it's not high tech. Anyhow, Oxford
still has a good group doing useful stuff in historical
linguistics. It provides a lot of important data for how
languages have changed and evolved, and that's one of the
important questions in linguistics, even though it's not one we
have all the answers to, because there's a lot of complicated
issues in it.
Interviewer: Well, on the very basic end, why are there
differences between different languages and today? Do we all
have a common root or are there are several common roots?
Greg Kochanski: Well, common root, well, no one knows. We know
all the Indo-European languages probably have a common root.
Indo-European languages basically come from the Caucuses, and
are spoken in most of Europe and well, India.
The trouble is, you can trace back the connections between
languages only so far before they get lost in the noise.
Languages change enough over a few thousand years that the
relationships between them become unclear. Languages are always
borrowing from other languages and things like that. But back
in the 1800s, people realized that there strong connections
between English and German, and English and French, and French
and Latin, and French and Italian. All these languages have
many equivalent pairs of words like "father" versus "pater".
There are a whole bunch of similarities which you could pull
together with a few simple rules to show that here is a set of
words common to all the languages, but the languages have
diverged by changing a few rules about how you pronounce
things. With simple rules about sound change, you can tie
together quite a lot of similarities, and you can trace
similarities in the grammar and stuff too. But it is pretty
clear that you can only look back maybe four thousand years
that way, and it gets pretty fuzzy at the far end.
So, languages that have split off less recently than that, you
just can't say how they're connected. We have no idea for
instance what, if any, connection there is between Japanese or
Mandarin-Chinese and English. There are similarities,
differences and we're just lost in the midst of time.
Interviewer: Is it feasible to assume that there was a
connection at some stage?
Greg Kochanski: Well, people have worked done the question of
when human language evolved and it's pretty clear it evolved a
lot more than 4,000 years ago. People are talking about sort of
100,000 years ago time scales. So, it's easy to imagine lots of
history going on that we don't know about. For a lot of human
history, people were fairly separated, each in their own little
village, with not much commerce from village to village, and
that is a situation that grows languages. You can still see the
way it could works, even today, in places like New Guinea. Now,
the highlands of New Guinea are the home of almost half of the
world's languages, and in that little small geographic area and
that's because basically, there were no roads, very rough
terrains, so people hardly went beyond their neighboring
villages.
Over time, small changes in the language accumulate differently
in each village until there are enough so that the dialects in
the two villages become mutually unintelligible, and then you
have a new language. As long as the two dialects are mutually
intelligible, there is a sort of glue holding them together.
But once it becomes too hard to figure what the other person is
saying, contact and commerce drop off, and there's no reason to
keep the languages the same at all. They just drift in
different directions. (There are so many directions that you
can change a language, there's no reason for two dialects to
drift the same way.) So, that kind of thing could have happened
for millennia.
People are trying to do mathematical models of language
evolution and have had some success. But so far, the modeling
that they've been doing has been basically analogous to
biological genetics and that's not more than part of it.
Language has not only evolved just like species. In biological
evolution, you get your genes from your parents and they got
theirs from their parents and whatnot so you can track a tree
of ancestors. But in language, you borrow. If someone comes and
conquers you, they'll leave a bunch of words. Or, if you go
trading with someone and they've got some new toy or technique
out there, you'll buy their toy and you may take the word they
use, also.
So, language evolution is not just a case of people living in
the same village and year after year with their language
drifting. Some word could just sweep all of a continent because
some idea reached that continent. So, it's also tied in with
politics and history and invasions and commerce and technology
and all kinds of things.
Interviewer: Can you talk about the relationship between
Mathematics and phonetics because traditionally, Maths doesn't
seem to have any association between humanities?
Greg Kochanski: Well, we kind of are humanities and kind of are
science. Phonetics is the experimental end of linguistics and
linguistics is a very broad field which really goes from -- on
one end, people doing articulatory models with computers
looking at muscles moving and aerodynamics and -- on the other
end, really philosophy and cognitive science and that kind of
thing. Obviously, some ends of the field are more mathematical
than others. We are in the humanities in the sense that we are
trying to understand how people think and explain the human
condition, and language is a big chunk of the human condition.
Language is the glue that keeps us from being locked into our
own little heads. But on the other hand, there are some corners
of linguistics that we understand well enough to start applying
the techniques of science to the questions that we do
understand, and those ends end up being fairly mathematical.
We're just finishing a research project where we're looking at
tongue position, tongue motion with an MRI machine and we're
taking that data and we're trying to test linguistic theories.
Chomsky based his theories on things called "features", which
are little instructions to the tongue, like "high" and "low".
People have built on that, talking about feature spreading, and
we've turned that kind of broad theoretical description into a
set of mathematical models. If you want to capture most of the
important possibilities that linguists have written about, you
need more than 200 mathematical models. We want to make these
models, then test them on the data to see which one works. By
doing this, we can eliminate a lot of possibilities as not
being a good explanation of the data; this is a
state-of-the-art approach to linguistics that hasn't really
caught on yet everywhere. We're hoping it will; it should. It's
a scientific approach to linguistics if you like. This kind of
approach is only possible now because we have enough computer
power that (a) we can do the image processing, we can figure
where the tongue is, and (b) we can evaluate the mathematical
models of where the tongue ought to be, and not just one,
either.
Language is complex enough so that it's hard or impractical to
do a precise theory. So, since theorists aren't superhuman, you
end up with fairly vague theories that don't give details. If
you want to try to make them precise enough to test, you have
to put a lot of options. For instance, one of the models we're
looking at has features like "high" and "low" to say where the
tongue should go. But, not every sound specifies the tongue
position in detail. Some sounds don't specify some features. If
a feature is unspecified, what do you do? Where do you put your
tongue? Well, some theorists say that you take the feature to
the left (the feature that hasn't happened yet) of the one that
you're doing. Other theorists say that you hold on to the past:
if the tongue was high, you keep it high until you need to move
it somewhere else. So, there are two options on how you fill
these empty features to control your tongue. You combine that
pair of options with a few more and you'll end for four or six
possible models to test, then you combine that with something
else, you end up with 12 or 24. The number of detailed models
you have just expands immensely because you have a whole bunch
of choices to make, which aren't really specified by the
linguistic theories. But you have to specify them in order to
turn it into a concrete model that you can test. So we're doing
it using a little brute force; we're testing them all.
That's really the difference between a lot of linguistics and
the rest of science. Science more or less operates following
the rules that were first written down by Karl Popper, which is
that you come up with a hypothesis and you make predictions
with the hypothesis and you test it. If it works, good; you can
try testing it again. If it doesn't work, you throw away the
hypothesis and you go off do something else. It's a very
evolutionary approach. It's a competition between ideas. If the
idea makes a prediction and the prediction works, it's good. If
it doesn't make a prediction, it's not very useful. If it
doesn't work, it's junk. Popper's recipe works if you have
theories that make specific predictions. The trouble with a lot
of linguistics is, it's complex and messy and squishy subject,
and even if you have a theoretical description of things, it's
hard to translate that into predictions, and so it's hard to
test models. Because you can't test and eliminate models, a lot
of different theoretical predictions co-exist and you don't
have a high-level of competition between ideas.
Part of the difference between linguistics and the hard
sciences is very much of a cultural thing. In physics, if
somebody says "this is my model," he or she is implicitly
saying that "this is the way the universe works" and everybody
else's model is wrong. Everyone believes that the universe does
things in only one way, so there can be only one correct model
that describes how it works. In linguistics, instead you say
"this is my view of things." Or, "I can describe things this
way," but you're not implying that every other view is wrong.
Someone else can have their own viewpoint or description of
whatever is going on. So, different linguistic viewpoints
co-exist. They're not really considered to be in conflict
because in fact, it's relatively hard to translate them into
specific predictions to find if they are actually in conflict
or not.
Different fields use different metaphors: in the hard sciences,
you describe the mechanism that causes something to happen. In
Linguistics, you separate your viewpoint from the phenomenon,
and you say "this is what it looks like from my hilltop" with
the understanding that it might look different from another
hilltop. Because of this, linguistics has lots of viewpoints,
so we have lots of different models to test.
So, anyhow, we have found some cases where you can translate
linguistic theories or viewpoints into a collection of models
that are detailed enough to test. Then we can test the
collection of models and find which ones fly and which ones
fail. Sometimes, you find that one theoretical viewpoint leads
to models which more or less universally fail and then you can
throw it out. That's the hope for progress in the field.
This approach won't always work because sometimes, a linguistic
viewpoint will give you some successes and some failures and
then life gets complicated and messy, but that's not always
going to be the case. When you do get a fairly clear cut
answer, then you've learned something real, learned something
that you can't do just by sitting there theorize things and
thinking about language.
Interviewer: Is it possibly to predict what changes will occur
in language in the future?
Greg Kochanski: In practice?
Interviewer: Uh-huh.
Greg Kochanski: Certainly not at our current level of
understanding. We know a little bit about how languages change.
But the understanding we have is more descriptive than
predictive. It centers around the overlap of things, the need
to be clear and the need to not confuse people you're talking
to. So, we understand the process where something moves and it
bumps into something else and pushes it. But why things move is
driven by a lot of factors including who's got social status
and a lot of fashion things and cultural things at every
moment.
People speak a bunch of different dialects in the U.K. and if
the status of one group rises, by and large, the language will
tend to shift in their direction, or at least some aspects of
language will shift in their direction. But, even if language
precisely follows the money and status, you can't predict which
way the language will move unless you know who's going to be
the top dog.
Even then, television shows can make certain bits of language
important, interesting, nifty, and. Well, there was a sweet
little fad that -- 30 years ago, there was a pet rock fad in
the U.S. where you could buy for Christmas a little plastic box
with a window in it and a rock. It was sort of a set up as a
terrarium so you can have your own pet rock. For God knows what
reasons, people would pay real money for these things and it
was just a fad. It's a kind of thing that seems like a good
idea at the time but a few years later, it may seem less of a
good idea. But, anyhow, words can work like that too. For
instance, people say "Not" after sentences. That started
happening a few years ago, and it may or may not last. But, I
can't imagine coming up with a prediction of either one of
those fads in advance.
Interviewer: What about the pace of language evolution there,
is it still changing at quite a rapid pace?
Greg Kochanski: Yes. There were predictions back 50 or 100
years ago that we would all speak the same language and speak
the same dialect because of television and audio recording. In
fact, people predicted that language evolution would stop
because of tape recorders or maybe it was even wire recorders,
because we would be able to hear the way the previous
generations spoke and we would want to speak that way. Well,
these days, we can easily hear the way that previous
generations speak, but we don't actually have any particular
desire to speak exactly the same way. That may be a teenage
thing.
Interestingly enough, dialects haven't gone away. Dialects seem
as strong now as they were (maybe not quite as strong) but
dialects are doing very well in the U.K. and in the U.S. I
think one of the factors that wasn't expected was that people
make a distinction between the languages they understand and
the languages they produce. Everyone understands the standard
dialects of English, but that doesn't mean that they
necessarily want to produce them. Probably it's a social thing.
Probably you want to sound like your friends and if your
friends use a particular dialect, well, that's what you'll
produce, even though you understand six or eight other
varieties of English quite well. Or, maybe it's an identity
thing: you think of yourself as a Scot, and don't want to
pretend to be something you're not. Or, maybe it's a
fear-of-embarrassment thing: you don't want to do a bad job of
some new accent, so you never speak it, so it never gets
better.
Interviewer: I had a general question about when we have ideas
in the mind, they can be expressed, involved in starting from
an idea and putting it into speech?
Greg Kochanski: Well, quite a lot. There is certainly not a
simple connection between how we think and language. From a
neurological point of view, there are a lot of steps following
motion planning and also before that, when you construct the
sentence and you have to worry about what words you're going to
use.
Interviewer: What about speech in noisy surroundings? How do we
fill in the gaps when we haven't heard all of someone's speech?
Greg Kochanski: Yes. Well, language has a lot of redundancy
bothf acoustically and also in the syntax. For instance, if you
look at Latin, you have a bunch of cases (The cases are
nominative, genitive, dative, accusitive, ablative, and
vocative.). Or, if you look at French, every noun is either
masculine or feminine. If you look at English, you have plural
carried through from the noun to the verbs. Many things in the
sentence needs to agree on whether it is singular or plural,
and for languages that have case and gender, sentences need
agreement on cases, and agreement on gender.
All those properties are really a way to insert some redundancy
into the sentence. This redundancy allows error correction and
error detection by the listener. So, if I say something that
starts plural and it doesn't end plural, you immediately know
that you misheard something or I misspoke something or
something has gone wrong. And likewise, in Latin, if you messed
up the case agreement, the listener knows that something has
gone wrong, and can go back and ask for corrections or he/she
can at least realize that "I misunderstood something: I need to
fill it in from context." (Of course, even with this mechanism
to catch some errors, sometimes we just get confused, sometimes
we don't understand what the other person is saying for a
variety of reasons.)
But this redundancy also occurs at the acoustic level. There
are some good experiments showing that the way you pronounce
one sound affects the pronunciation of the neighboring sounds,
sometimes even more than a syllable away, so that -- for
instance, this is especially relevant if someone's hammering,
right? I mean hammer blows completely destroy the particular
sound underneath the hammering, but you have a lot of
information from the neighboring sounds. From the sounds around
the hammer blow, you can reconstruct the missing sound just by
thinking this way: That sounded as if his mouth was wide open
at the end of the syllable. So presumably, his mouth was wide
open in the middle of the syllable, so I can figure out what
the vowel was even though I didn't actually hear it. You can do
all this because the vowel changes the consonants near it.
Interviewer: And that's an automatic process?
Greg Kochanski: And that's an automatic process, yes. The
experiments on this are done by replacing bits of speech with
white noise (instead of hammer blows). You can replace a
remarkable amount of speech with white noise and still extract
meaning from it.
I did this experiment once where we where trying to compare the
performance of automated speech recognition systems with human
performance. The idea would be you take a five-digit number and
replace increasing amounts of it with white noise. So, it would
be like -- well, I can't really do it, but it -- two, one,
three, three, four, one, five with bursts of hiss -- and if you
left little gaps in the white noise, you could actually replace
90% of the sounds with noise and people will would still get
the number right half the time. You have to have fairly small
little slices and the little slices where you can hear bits of
the digit have to be close enough together so they give you a
little bit of a clue to more or less each digit. It's a very
freaky experiment because you listen to this and you are
absolutely convinced you're just guessing. You know you can't
possibly understand this hissy mess. It's just basically, it's
just [speaker made a sound] but then you guess, and you find
that half the time, you got this five-digit number right.
That's a bit surprising, because there are a 100,000 different
five digit numbers, and half of the time you pick the right one
out of those 100,000 choices, even though you think you're just
guessing.
I tried it myself because I didn't believe the results and it's
really, psychologically, a very interesting process. You don't
even know you're gluing the pieces back together and making a
guess, but it works quite remarkably well, and that's a case
where it's just the acoustic context. Well, not quite --
because you know that each sound can only be one of a few
possible number names. So if you hear a [speaker made a "v"
sound], you know it probably comes from a seven and there's a
seven around there. If you didn't hear a [speaker made a
fricative sound] sound, that's probably a nine or a one, I'd
guess.
Small clues can give you a lot of information in a
restricted environment like that but yes, that happens
automatically, you don't even know it's happening.
Interviewer: It's quite remarkable.
Greg Kochanski: Yes. There's a lot of stuff that goes on up in
the brains. That one is not really aware of, that's why we do
experiments because you can't realize these things just by
thinking about them.