Greg Kochanski |
Does each language have its own performance style, or are they all the same, except that the words are changed? One gets the strong feeling that they are not all the same, and Linguists have talked about a different style or rhythm for each language, in analogy with musical rhythms. Our aim in this project was to find quantitative metrics that would capture this impression.
Our working assumption was that the rhythmic differences between languages are large enough so speakers of one (e.g. English) rarely produce the rhythms of another language (e.g. French). So, we collected data from five languages to test this idea. First, we investigated all 15 of the techniques that other researchers have published; they were based on the duration of speech sounds, in analogy with musical rhythm.
Somewhat surprisingly, we found that there people from one language often speak with the duration patterns of another: for instance, French people whose patterns of sound duration were typical of Greek. This will lead to a reassessment of some research that assumed that each language had a distinctive rhythm. Rather, our data suggests that each person has their own rhythmic style, and the language they speak influences that style.  We also showed that the old idea of dividing languages into two clear rhythmic classes is too simple: there are more than two kinds of rhythm.
When we say that "music, poetry and language all have rhythms", what is meant by "rhythm"? What accounts for the rhythmic differences between languages or dialects? We do not have a detailed understanding of what constitutes rhythm in language, though evidence suggests that it is related to patterns of vowel length, stess/accent, and the complexity of consonant clusters.
When we started this project, we hoped that understanding rhythm might also have practical applications: speech rhythm has been studied as an aid in diagnosing depression, predicting suicide risk, and distinguishing depression from Parkinson's disease. Unfortunately, we seem to have proved that medical applications will be difficult and/or impractical. Individual variation is large, even amongst normal individuals, so one would have to find very large differences between people with medical problems and the normal group for these approaches to be useful. Also, practical clinical applications would be much easier if unscripted speech could be used. Unfortunately, the rhythm measures we studied do vary substantially as you change the text, and early results seem to show that the variation cannot be predicted well by simple approaches. However, bear in mind that these conclusions apply primarily to the duration-based rhythm measures that we studied intensively. It's possible that other properties of speech may still be useful as a diagnostic tool for these problems.
Within the last decade, tools for quantitative measurements of rhythm have begun to appear. So far, these rhythm measures require much careful manual marking of the speech, and they are highly dependent on the choice of words. Mostly, they have been limited to carefully designed laboratory experiments.
These tools are formulae that compare the lengths of vowels, consonants, and nearby sounds. This project will start with those tools, will test and sharpen them, and then will apply them to a linguistic question: to what extent do British English dialects share a common rhythm?
To sharpen our tools, we will systematically test variants of the rhythm measures and see which variants work best. The resulting formula will be our tool for measuring rhythm. Variants will be tested by checking that they have the right properties: that a person should have a reasonably consistent rhythm from hour to hour, that people who speak the same dialect will share broad rhythmic properties and that different dialects and languages can have different rhythms. We do this by building machine classifiers that will try to separate data into groups based on differences in the rhythm measures; the larger the rhythmic differences are, the more successful the classifier will be. If two languages have distinct rhythms, the classifier will be able to correctly name the correct language when given rhythm measures; if dialects overlap, the classifier will be little better than a guess.
So, we feed each variant of the rhythm measures into a classifier: if the combination can accurately and reliably distinguish different languages, good. If not, we look for a better version. This way, we can systematically search for better and better rhythm measures. When this is done, we will have learned something even before we use the tool: we can look at the formulas for the rhythm measures and see what acoustic characteristics are most important. Then, the most important acoustic properties in the formula are presumably also important to humans.
Human language is mostly unscripted conversation. However, to use this kind of data, we need to understand how the rhythm measures depend on the choice of words. They do because rhythm measures depend on the length of vowels, so lengthening a vowel from "bit" to "beet" will change the measured value. Existing work has largely avoided this problem by requiring subjects to read a prepared text, but we plan to put rhythm experiments using unscripted speech onto a sound footing.
To do so, we will correlate changes in rhythm measures with changes in the phonological properties of the text by learning which combinations of speech sounds increase or decrease different rhythm measures. If freed from the need for a script, clinical applications might become easier. From a theoretical point of view, we can also use these improved rhythm measurements to check our understanding of the phonetics of rhythm.
After this preliminary work, we plan to produce the first quantitative survey of the rhythm of British English dialects. Are their rhythms closely related or not? How much closer together are the rhythms of dialects than typical languages? It is an open question, made more interesting by recent work that shows the intonation of British dialects can be very different. This will provide basic data to help our understanding of how dialects evolve and interact with each other.
We found that languages have different phonotactics based on canonical transcriptions, but phonological properties do a surprisingly poor job of predicting differences in rhythm measures. Surprisingly, we found that the bulk of the differences were speaker-to-speaker differences. These results were reported in conference papers by Keane et al (2010) and Loukina et al (2010a); a journal paper is in progress.
We found that the language-to-language differences in rhythm are no larger than the person-to-person differences. For example, one can find French speakers who use Greek rhythms, and similarly for most of our language pairs. Together with the last, this suggests that phonology may overstate the differences between languages, possibly because each language is represented in a different way; this may affect the interpretation of many cross-linguistic studies in linguistics.
While most pairs of our languages could be separated fairly well with a classifier based on just one RM, each different pair typically needed a different RM. Combinations of three RMs were needed to maximise the identification rate for all five languages at once. In addition to these statistical results, our data show that the languages group differently, depending upon which RMs are used to classify them. Thus, languages differ in several different ways and there are no absolute rhythm classes. These findings (supported by a multidimensional scaling analysis) show that linguistic rhythm is multidimensional. These results were reported in a conference paper Loukina et al. (2009) and submitted to JASA as a journal paper. We plan to propose a perceptual study to validate these results; they substantially improve our understanding of rhythmic typology and methodology.
Going beyond existing rhythm measures that are (primarily) focussed on segment duration, we developed a novel technique, defining the strength of rhythm as the predictability of acoustic properties. Testing this, we confirmed that it allowed us to effectively separate poetry from prose in all five languages. Notably, patterns of segment duration are among the least predictable of the properties we investigated. These results were reported in conference papers Kochanski et al (2010a) and Kochanski et al (2010b). This work may lead to an entirely new class of rhythm measures.
We found that English dialects have overlapping rhythms. In agreement with findings from the main corpus, we found that different groupings of dialects could be observed depending on the choice of rhythm measure(s). Based on RQ2b, we also developed new loudness-based measures. These separated dialects better than chance and revealed yet more different groupings. These results have been reported in a conference paper Loukina and Kochanski (2010).
Our work has highlighted the subjectivity of manual labelling. We have
shown that acoustic properties of segments do not always match their expected phonological or even phonetic category. These differences are language-specific and suggest that rhythm measures based on manual labelling are strongly affected by the conventional phonological interpretation of sounds in that language. These results are in the journal paper.See the official ESRC website for project RES-062-23-1323 which contains project details and lists papers related to this project. Other results can be found at http://kochanski.org/gpk/papers or on the Oxford Library Service website. Other than scientific papers, we have released our data collection software, and the corpus of speech recordings is available at http://www.phon.ox.ac.uk/corpus . Here's the final report .
[ Papers | kochanski.org | Phonetics Lab | Oxford ] | Last Modified Thu Jan 20 15:47:26 2011 | Greg Kochanski: [ Home ] |