Linguistics cannot develop in isolation.

Greg Kochanski

H1 Linguistic Isolation

It is an uncomfortable fact that linguistics is being left behind as other fields adopt new techniques. The field needs to build bridges so that linguistics can make use of the tools and knowledge that the rest of science has generated.

Linguistics can be considered the study of certain very complex animal behavior, and it should see itself as one end of the grand reductionist enterprise that connects experimental psychology to neurobiology, through biochemistry and eventually down to physics. It is not a field that stands in isolation on its own, even though it (like any other science) has unique techniques and knowledge.

Neuroscience and Computational Neuroscience can elucidate the detailed strategies that the brain uses to move articulators and thus create syllables and words. If we can build mathematical models of these strategies, we can explain phenomena like vowel reduction, coarticulation effects and the whole range of prosodic modifications of speech. Much of phonetics is ultimately concerned with understanding how the brain maneuvers continuously variable articulator positions to transmit discrete linguistic symbols to the listener.

With a solid phonetic model, one could use that model to explore phonology. Given a trustworthy phonetic model, anyone can take two phonological theories, predict the phonology of a corpus of speech, and ask "Which theory matches the data better?" The failing theory can then be discarded. Even an incomplete phonetic model can be used this way to check phonological theories; it can force the phonological symbols to be used in a consistent manner, and reveal if the phonology is supplying enough information to explain the acoustic evidence.

Half a century ago, when Chomsky was new, there was no real alternative to introspection, if one wanted to find out what was going on in someone's brain. The linguist simply had to ask himself or a subject what was going on, and accept the description that came back. But, progress has been made. Experimental techniques like Magnetic Resonance Imaging, EMA and X-ray microbeam techniques can watch articulators move without much interference to normal speech. Magneto- and electro-encephalograms can provide temporally precise (but spatially crude) clues to the activity inside the brain. Conversely, fMRI provides a spatial description of focussed brain activity (though with poor temporal resolution). In addition to direct views of the brain, reaction-time and eye-tracking experiments provide hints on the processes that the brain uses.

Although none of these techniques will visualize a syntax tree, they provide some direct access the the processing of language, and may allow linguists to deduce how syntax is processed. Having a set of different techniques is crucial, because any one technique has biases and limitations. Linguistics' historical heavy dependence on introspection means that the field presumably must have followed the limitations and biases of the data-gathering technique. As linguistics moves away from a single data source, we will likely be surprised at some of the biases that will be uncovered.

Beyond that, statistical techniques like bootstrap resampling can be used to extend historical linguistics: they can answer questions like "When should we stop clustering languages?" Perhaps they can also tell when the sum of many inconclusive similarities between language families becomes strong enough to yield a solid relationship. Or, perhaps it would be better to answer small questions, such as "Given a certain, small amount of text in a dead language, what is the chance that the language requires agreement of case?"

Bayesian belief networks could be applied to syntax, to prune over-elaborated theories, and to show syntacticians where the solid ground is. Using them, one could hope to answer a question like this: Assume that theory X depends on thirty attested judgements of grammaticality. Assume, further, that the attested judgements are drawn randomly from one or another of two closely related dialects of English that differ in one small way. What is the probability that theory X will duplicate the first dialect? The second? What is the probability that the theory constructed from two dialects matches neither dialect's grammar?

Machine learning systems are another example, but also provide a cautionary tale. They are powerful mathematical tools that can absorb a vast corpus, and then do a competent job of parsing or predicting text. However, despite their power and utility, they have contributed less than one might hope to our actual understanding of language. Why? Because they were developed and applied to language by computer scientists and software engineers, and the engineering approach came along with the tools. Users of machine learning systems have (on average) been less interested in learning than in building systems, and more interested in the tools than the humans. It should be no surprise that the results reflect the interests of the users. This issue needs to be kept in mind for curriculum development. One needs to remember that linguistics has an engineering aspect, and that the language engineering part of the field has distinct goals and somewhat different methods from the scientific side of linguistics, and also from the parts of linguistics that border on literature and have the flavor of the humanities.

It is particularly important to include these kinds of skills in the linguistic curriculum, so that the graduating students will have the tools they need to make real progress in understanding language, how we use it, and how our brains process it. The last thing we want is to educate them to the mid-twentieth century, and set them adrift in the twenty-first.

[ Papers | kochanski.org | Phonetics Lab | Oxford ]

Last Modified Sun Dec 7 09:32:14 2003

Greg Kochanski: [ Home ]