Prosody and Prosodic Models
ICSLP 2002 - September 16, 2002, Denver Colorado
Chilin Shih and Greg Kochanski

Bibliography

Web Sites

Recent Books

Intonation Models and Modeling Techniques

Tone and Accent Alignment

Emotions

Acoustic Correlates of Stress and Accent

Physiological Explanation, Articulatory

Discourse

Segmental Effects

Prosody Markup Language

Representation of Prosody, Language Description

Web Sites

Recent Books

Horne, M., (ed). (2000). Prosody: Theory and Experiment. Studies Presented to Gösta Bruce. Kluwer Academic Publishers, Dordrecht.
Botinis, A., (ed). (2000). Intonation: Analysis, Modelling and Technology. Kluwer Academic Publishers, Dordrecht.
Sagisaka, Y., Campbell, W., Higuchi, N. (eds.) (1998). Computing Prosody: Computational Models for Processing Spontaneous Speech. Springer-Verlag, Berlin.
Hirst, D., Di Cristo, A. (eds.) (1998). Intonation Systems: A Survey of Twenty Languages. Cambridge University Press.
Stevens, K. (1998). Acoustic Phonetics. The MIT Press, Cambridge Mass.
Ladd, D. R. (1996). Intonational Phonology. Cambridge University Press, Cambridge.

Intonation Models and Modeling Techniques

Anderson, M., Pierrehumbert, J., and Liberman, M. (1984). Synthesis by rule of English intonation patterns. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, volume 1, pages 2.8.1-2.8.4, San Diego, CA, USA. ICASSP.
Black, A. W. and Hunt, A. J. (1996). Generating f0 contours from ToBI labels using linear regression. Proceedings of ICSLP 96, Philadelphia, PA, USA.
Chen, S.-H., Hwang, S. H., and Tsai, C.-Y. (1992). A first study of neural net based generation of prosodic and spectral information for Mandarin text-to-speech. Proceedings of IEEE ICASSP, volume 2, pages 45-48.
de Pijper, J. R. (1983). Modelling British English Intonation. Foris Publications, Dordrecht, Holland.
XXX Dusterhoff, K. E., Black, A. W., and Taylor, P. Using decision trees within the tilt intonation model to predict f0 contours. In Eurospeech.
Fujisaki, H. A note on the physiological and physical basis for the phrase and accent components in the voice fundamental frequency contour. In Fujimura, O., editor, Vocal Fold Physiology: Voice Production, Mechanisms and Functions, pages 347-355. Raven, New York.
Fujisaki, H. (1983). Dynamic characteristics of voice fundamental frequency in speech and singing. In MacNeilage, P. F., editor, The Production of Speech, pages 39-55. Springer-Verlag.
Hirst, D. J., Di Cristo, A., and Espesser, R. Levels of representation and levels of analysis for the description of intonation systems. In Horne, M., (ed.), Prosody: Theory and Experiment. Studies Presented to Gösta Bruce, pages 51-87. Kluwer Academic Publishers, Dordrecht.
Kochanski, G. P. and Shih, C. (2002). Soft templates for prosody mark-up. Speech Communications. In print.
Levitt, H. and Rabiner, L. R. Analysis of fundamental frequency coutours in speech. Journal of Acoustical Society of America, 49(2):570.
Liberman, M. Y. and Pierrehumbert, J. B. (1984). Intonational invariance under changes in pitch range and length. In Aronoff, M. and Oehrle, R., editors, Language Sound Structure, pages 157-233. M.I.T. Press, Cambridge, Massachusetts.
Malfrère, F., Dutoit, T., and Mertens, P. (1998). Fully automatic prosody generator for text-to-speech. In Proceedings of the International Conference on Spoken Language Processing, Sydney, Australis.
Öhman, S. (1967). Word and sentence intonation, a quantitative model. Technical report, Department of Speech Communication, Royal Institute of Technology (KTH).
Olive, J. P. Fundamental frequency rules for the synthesis of simple declarative english sentences. Journal of Acoustical Society of America, 57:476-482.
Pan, S., McKeown, K. Hirschberg, J. (2001). Semantic Abnormality and its Realization in Spoken Language. Proceedings of Eurospeech 2001 Aalborg, Denmark.
Ross, K. N. and Ostendorf, M. (1999). A dynamical system model for generating fundamental frequency for speech synthesis. IEEE Transactions on Speech and Audio Processing, 7(3):295-309.
Taylor, P. A. Analysis and synthesis of intonation using the tilt model. Journal of Acoustical Society of America, 107(3):1697-1714.
Taylor, P. A. (1998). The tilt intonation model. In Proceedings of the International Conference on Spoken Language Processing, Sydney, Australia.

Tone and Accent Alignment

Amalia Arvaniti, A., Ladd, D. R., Mennen, I. (1998). Stability of Tonal Alignment: the case of Greek Prenuclear Accents. Journal of Phonetics 26: 3-25.
Ladd, D. R., Faulkner, D., Faulkner, H., Schepman, A. (1999). Constant segmental anchoring of F0 movements under changes in speech rate. Journal of the Acoustical Society of America 106, 1543-1554.
Ladd, D. R., Mennen, I., Schepman, A. (2000). Phonological conditioning of peak alignment of rising pitch accents in Dutch. Journal of the Acoustical Society of America 107, 2685-2696.
Pierrehumbert, J., Steele, S. (1990). Categories of Tonal Alignment in English. Phonetica. pp. 181-196.
Pierrehumbert, J. (1998). Tonal elements and their alignment. In M. Horne, (ed.) Prosody: Theory and Experiment. Studies Presented to Gösta Bruce. Kluwer, Dordrecht.
Prieto, P., Nibert, H., Shih, C. (1995). Effects of Phrasal Length and Time Distance between Peaks on Peak Height in Mexican Spanish. International Conference on Spoken Language Processing , pp. 730-733.
Prieto, P., van Santen, J., Hirschberg, J. (1994) Patterns of F0 peak placement in Mexican Spanish. Proceedings of the Second ESCA/IEEE Workshop on Speech Synthesis, pp. 30-34.
Silverman, K. and Pierrehumbert, J. (1990). The Timing of Prenuclear High Accents in English. In Papers in Laboratory Phonology I , J. Kingston and M. Beckman, (eds), Cambridge University Press, Cambridge UK. 72-106.
van Santen, J. P. H., Möbius, B. (2000). A quantitative model of $f0$ generation and alignment. In Botinis, A., editor, Intonation: Analysis, Modelling and Technology, pp. 269-288. Kluwer Academic Publishers.
van Santen, J. P. H. and Möbius, B. (1997). Modeling pitch accent curves. In Intonation: Theory, Models, and Applications. Proceedings of ESCA Workshop, pp. 321-324, Athens, Greece.
Xu, Y. (1998). Consistency of tone-syllable alignment across different syllable structures and speaking rates. Phonetica 55: 179-203.
Xu, Y. (1999). Effects of tone and focus on the formation and alignment of F0 contours. Journal of Phonetics 27: 55-105.
Xu, C. X., Xu, Y., and Luo, L. S. (1999). A pitch target approximation model for f0 contours in Mandarin. Proceedings of the 14th International Congress of Phonetic Sciences, pp. 2359-2362, San Francisco.
Xu, Y. and Wang, Q. E. (2001). Pitch targets and their realization: Evidence from Mandarin Chinese. Speech Communication 33: 319-337.

Emotions

Web sites
- Demonstrations of synthesized emotional speech, Felix Burkhardt
- Emotional speech Homepage, DSPLAB, University of Maribor, Slovenia
- Affective Computing, Research on Human Emotions, MIT
- Emotional and Expressive Synthesized Speech, Janet Cahn, MIT
Books and Papers:
Proceedings of the ISCA Workshop on Speech and Emotion. Northern Ireland, 2000.
Adolphs, R., Tranel, D., Damasio, H. (2002). Neural Systems for Recognizing Emotion from Prosody. Emotion 2: 23-51.
Amir, N., Ron, S., (1998). Towards an automatic classification of emotions in speech. Proceedings of ICSLP 98. Sydney, Australia.
Cahn, J. E. (1989). Generating Expression in Synthesized Speech. Master's Thesis, MIT.
Cauldwell, R. T. (2000). Where did the anger go? The role of context in interpreting emotion in speech. ISCA Workshop on Speech and Emotion, A conceptual framework for research. Northern Ireland.
Cosmides, L. (1983). Invariances in the acoustic expression of emotion during speech. Journal of Experimental Psychology: Human Perception and Performance 9, pp. 864-881.
Cowie, R., Douglas-Cowie, E. (1996). Automatic Statistical Analysis of the Signal and Prosodic Signs of Emotion in Speech. Proceedings of ICSLP 96. Philadelphia.
Dellaert, f., Polzin, T., Waibel, A. (1996). Recognizing Emotion in Speech. Proceedings of ICSLP 96. Philadelphia, USA.
Ekman, P. (1995). The Nature of Emotion--Fundamental Questions. Oxford University Press.
Hauser, M. (1977). Information about affective state. The Evolution of Communication. MIT Press, pp. 476-496.
Heuft, B., Portele, T., Rauth, M. (1996). Emotions in Time Domain Synthesis. Proceedings of ICSLP 96. Philadelphia, USA.
Johnstone, I. T., Banse, R., Scherer, K. R. (1995) Acoustic Profiles from Prototypical Vocal Expressions of Emotion. Proceedings of the 13th International Congress of Phonetic Sciences.
Maekawa, K. (1998). Phonetic and phonological characteristics of paralinguistic information in spoken japanese. In Proceedings of the International Conference on Spoken Language Processing.
Montero, J.M., Gutierez-Arrilola, J., Palazuelos, S., Enriquez, E., Aguilera S., Pardo J.M. (1998). Emotional Speech Synthesis: From Speech Database to TTS. Proceedings of ICSLP 98. Sydney, Australia.
Mozziconacci, S. J. L., Hermes, D. J. (1999). Role of Intonation Patterns in Conveying Emotion in Speech. ICPhS 99 .
Mozziconacci, J. J. L. (1998). Speech Variability and Emotion: Production and Perception. Ph.D. Thesis. Technical University Eindhoven.
Murray, I. R., Edgington, M. D. Campion, D. Lynn, J. (2000) Rule-based Emotion Synthesis using Concatenated Speech. ISCA Workshop on Speech and Emotion, A conceptual framework for research. Northern Ireland.
Murray, I. R., Arnott, J. L. (1993). Toward the Simulation of Emotion in Synthetic Speech: A Review of the Literature on Human Vocal Emotion. Journal of the Acoustical Society of America 93, 1097-1108.
Murray, I.R., Arnott, J. L. (1995). Implementation and testing of a system for producing emotion-by-rule in synthetic speech. Speech Communication 16, pp. 369-390.
Ohala, J. (1996). Ethological Theory and the Voice Expression of Emotion in the Voice. Proceedings of ICSLP 96. Philadelphia.
Pfeifer, R. (1988). Artificial Intelligence Models of Emotion. Cognitive Perspectives on Emotion and Motivation. V. Hamilton et al. (eds.) Kluwer Academic Publishers.
Picard, R. W. (1997) Affective Computing. The MIT Press.
Rank, E., Pirker, H. (1998). Generating Emotional Speech with a Concatenative Synthesizer. Proceedings of ICSLP 98. Sydney, Australia.
Schroder, M. (2001). Emotional Speech Synthesis--a Review. Proceedings of Eurospeech 2001. Aalborg. pp.561-564.
Schroder, M., Cowie, R., Douglas-Cowie, E., Westerdijk, M., Gielen, S. (2001). Acoustic Correlates of Emotion Dimensions in View of Speech Synthesis. Proceedings of Eurospeech 2001 , pp.87-90. Aalborg, Denmark,
Stibbard, R.M. (2001). Vocal Expression of Emotions in Non-laboratory Speech: An Investigation of the Reading/Leeds Emotion in Speech Project Annotation Data. Ph.D. thesis. University of Reading.
Williams, C. E., Stevens, K. N. (1972). Emotions and Speech: Some Acoustical Factors. Journal of the Acoustical Society of America 52, 1238-1250.

Acoustic Correlates of Stress and Accent

Beckman, M. E. (1986). Stress and Non-Stress Accent. Netherlands Phonetic Archives No. 7). Foris. Second printing, 1992, by Walter de Gruyter.
Beckman, M. E., Cohen, K. B. (2000). Modeling the articulatory dynamics of two levels of stress contrast. In M. Horne, (ed.) Prosody: Theory and Experiment. Studies Presented to Gösta Bruce, pp. 169-200. Kluwer.
Erickson, D. (1998). Effects of contrastive emphasis on jaw opening. Phonetica, 55:147-169.
Fry, D. B. (1955). Duration and intensity as physical correlates of linguistic stress. Journal of Acoustical Society of America, 30:765-769.
Fry, D. B. (1958). Experiments in the perception of stress. Language and Speech, 1:126-152.
Gårding, E., Fujimura, O., and Hirose, H. (1970). Laryngeal control of swedish word tones. Annual Bulletin of the Research Institute of Logopedics and Phoniatrics, 27:135-149.
Gussenhoven, C., Repp, B.H., Rietveld, A., Rump, H. H., Terken, J. (1997). The perceptual prominence of fundamental frequency peaks. JASA 102, pp. 3009-3022.
Hillenbrand, J. M. and Houde, R. A. (1996). Role of f0 and amplitude in the perception of intervocalic glottal stops. Journal of Speech and Hearing Research, 39:1182-1190.
Kehoe, M., Stoel-Gammon, C., and Buder, E. H. (1995). Acoustic correlates of stress in young children's speech. Journal of Speech and Hearing Research, 38:338-350.
Lieberman, P. (1960). Some acoustic correlates of word stress in American-English. Journal of Acoustical Society of America, 32:451-454.
Moon, S.-J. and Lindblom, B. (1990). Interaction between duration, context, and speaking style in English stressed vowels. Journal of Acoustical Society of America, pages 40-55.
Pollock, K. E., Brammer, D. M., and Hageman, C. F. (1990). An acoustic analysis of young childrens productions of word stress. Journal of Phonetics, 21:183-203.
Simada, Z. B. and Hirose, H. Physiological correlates of Japanese accent patterns. In Annual Bulletin of the Research Institute of Logopedics and Phoniatrics, volume 5, pages 41-49.
Sluijter, A. M. C. and van Heuven, V. J. Spectral balance as an acoustic correlate of linguistic stress. Journal of Acoustical Society of America, 100(4):2471-2485.
Sluijter, A. M. C., van Heuven, V. J., and Pacilly, J. J. A. Spectral balance as a cue in the perception of linguistic stress. Journal of Acoustical Society of America, 101(1):503-513.
Terken, J. (1991). Fundamental frequency and perceived prominence of syllables. JASA 89, pp. 1768-1776.
Terken, J. and Hermes, D. (2000). The perception of prosodic prominence. In M. Horne (ed.) Prosody: Theory and experiment. Studies presented to Gösta Bruce. pp. 89-127. Kluwer Academic Press. Dordrecht.

Physiological Explanation, Articulatory Modeling

Atkinson, J. E. (1978). Correlation analysis of the physiological factors controlling fundamental voice frequency. Journal of Acoustical Society of America, 63:211-222.
Browman, C. P. and Goldstein, L. (1990). Tiers in articulatory phonology, with some implications for casual speech. In Kingston, J. and Beckman, M., editors, Papers in Laboratory Phonology I: Between the Grammar and Physics of Speech, pages 341-376. Cambridge University Press.
Berry, D. A., Herzel, H., Titze, I. R., and Story, B. H. Bifurcations in excised larynx experiments. Journal Voice.
Herman, R., Beckman, M., Honda, K. (1999). Linguistic models of F0 use, physiological models of F0 control, and the issue of "Mean Response Time". Language and Speech 42, 373-399.
Herzel, H. (1995). Non-linear dynamics of voiced speech. In Awrejcewicz, J., editor, Nonlinear Dynamics: New Theoretical and Applied Results. Akademie Verlag.
Hollien, H. (1981). In search of vocal frequency control mechanisms. In Bless, D. M. and Abbs, J. H., editors, Vocal Fold Physiology: Contemporary Research and Clinical Issues, pages 361-367. College-Hill Press, San Diego, CA.
Keating, P. A. The window model of coarticulation: articulatory evidence. Papers in Laboratory Phonology I. Between the Grammar and Physics of Speech, pages 451-470. Cambridge University Press.
Ladefoged, P. (1962). Subglottal activity during speech. 4th International Congress of Phonetic Science, pages 247-265.
McFarland, D. H. and Smith, A. (1992). Effects of vocal task and respiratory phase on prephonatory chest-wall movements. Journal of Speech and Hearing Research, 35(5):971-982.
Lieberman, P., Knudson, R., and Mead, J. Determination of the rate of change of f0 with respect to subglottal air pressure during sustained phonation. Journal of Acoustical Society of America, 45:1537-1543.
L÷fqvist, A., Baer, T., McGarr, N. S., and Story, R. S. (1989). The cricothyroid muscle in voicing control. Journal of Acoustical Society of America, 85:1314-1321.
Monsen, R. B., Engebretson, A. M., and Vemula, N. R. (1978). Indirect assessment of the contribution of subglottal air pressure and vocal fold tension to changes in the fundamental frequency in english. Journal of Acoustical Society of America, 64(1):65-80.
Munhall, K. and L÷fqvist, A. (1992). Gestural aggregation in speech: laryngeal gestures. Journal of Phonetics, 20:111-126.
Perrier, P., O. D. J. L. R. The equilibrium point hypothesis and its application to speech motor control. Journal of Speech and Hearing Research, 39:365-378.
Pierrehumbert, J. (1997). Consequences of Intonation for the Voice Source. In S. Kiritani, H. Hirose, and H. Fujisaki (eds.) Speech Production and Language . Mouton de Gruyter, Berlin. 111-131.
Strik, H. (1994). Physiological control and behaviour of the voice source in the production of prosody. Ph.D. dissertation, University of Nijmegen.
Strik, H., Boves, L. (1992) Control of fundamental frequency, intensity and voice quality in speech. Journal of Phonetics 20, pp. 15-25.
Strik H., Boves, L. (1992) On the relation between voice source parameters and prosodic features in connected speech. Speech Communication 11, pp. 167-174.
Strik, H., Boves, L. (1995) Downtrend in F₀ and P_sb. Journal of Phonetics 23, pp. 203-220.
Titze, I. R. On the relation between subglottal pressure and fundamental frequency in phonation. Journal of Acoustical Society of America, 85(2):901-906.
Titze, I. R. The physics of small amplitude oscillation of the vocal folds. Journal of Acoustical Society of America, 83(4):1536-1552.
Titze, I. R. (1993). Principles of Voice Production. Prentice-Hall.
Whalen, D. H. and Kinsella-Shaw, J. M. (1997). Exploring the relationship of inspiration duration to utterance duration. Phonetica, 54:138-152.
Wier, C. C., Jesteadt, W., and Green, D. M. Frequency discrimination as a function of frequency and sensation level. Journal of Acoustical Society of America, 61:178-184.
Wilder, C. N. (1981). Chest wall preparation for phonation in female speakers. In Bless, D. M. and Abbs, J. H., editors, Vocal Fold Physiology: Comtemporary Research and Clinical Issues, pages 109-123. College-Hill Press, San Diego, CA. ISBN 0-933014-87-2.
Winkworth, A. L., Davis, P. J., Adams, R. D., and Ellis, E. (1995). Breathing patterns during spontaneous speech. Journal of Speech and Hearing Research, 38(1):124-144.
Winkworth, A. L., Davis, P. J., Ellis, E., and Adams, R. D. (1994). Variability and consistency in speech breathing during reading-lung volumes, speech intensity, and linguistic factors. Journal of Speech and Hearing Research, 37(3):535-556.
Xu, Y. and Sun X. (2002). Maximum speed of pitch change and how it may relate to speech. Journal of the Acoustical Society of America 111: 1399-1413.

Discourse

Bolinger, D. (1989). Intonation and its uses: Melody in grammar and discourse. Stanford University Press.
Hirschberg, J. (1992). Some Intonational Characteristics of Discourse Structure. ICSLP-92 .
Hirschberg, J. (1993). Pitch Accent in Context: Predicting Intonational Prominence from Text. Artificial Intelligence, 63(1/2), pp. 305-340.
Hirschberg, J., Nakatani, C. (1996). A Prosodic Analysis of Discourse Segments in Direction-Giving Monologues. Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics. pp. 286-293.
Hirschberg, J. and Pierrehumbert, J. (1986). The intonational structuring of discourse. In Proceedings of the 24th Annual Meeting of the Association for Computational Linguistics, volume 24, pages 136-144.
Hirschberg, J., Litman, D. (1994). Empirical Studies on the Disambiguation of Cue Phrases. Computational Linguistics, 19 (3), pp. 501-530.
Grosz, B., Hirschberg, J. (1992). Some intonational characteristics of discourse structure. Proceedings of ICSLP 92, V. 1. Banff, Canada. 429-432.
Nakajima S., Allen, J. F. (1993). A study on prosody and discourse structure in cooperative dialogues. Phonetica 50, pp. 197--210.
Nakatani, C. H. Hirschberg, J. (1994). A Corpus-based study of repair cues in spontaneous speech. Journal of the Acoustical Society of America, 95(3), 1603-1616.
Hirschberg, J., Avesani, C. (1997). The Role of Prosody in Disambiguating Potentially Ambiguous Utterances in English and Italian. ESCA Tutorial and Research Workshop on Intonation. Athens, pp.189-192.
Pierrehumbert, J., Hirschberg, J. (1990). The Meaning of Intonation in the Interpretation of Discourse. In P. Cohen, J. Morgan, and M. Pollack, (eds.) Intentions in Communication. MIT Press, Cambridge MA. 271-311.
Swerts, M. Geluykens, R. (1993). The prosody of information units in spontaneous monologue. Phonetica 50, pp. 189-196.
Swerts, M., Geluykens, R. (1994). Prosody as a marker of information flow in spoken discourse. Language and Speech 37(1), pp. 21-43.
Swerts, M. Hirschberg, J. (eds) (1999). Prosody and conversation. Special double issue of Language and Speech on Prosody and Conversation , 41:3/4.
Swerts, M. (1997). Prosodic features at discourse boundaries of different strength. Journal of the Acoustical Society of America 101 (1), pp. 514-521.
Swerts, M., Ostendorf, M. (1997). Prosodic and lexical indications of discourse structure in human-machine interactions. Speech Communication 22, pp. 25-41.
Terken, J., Hirschberg, J. (1994). Deaccentuation of words representing 'Given' information: Effects of persistence of grammatical role and surface position. Language and Speech 37, pp. 125-145.
Terken, J. and Nooteboom, S.G. (1987). Opposite effects of accentuation and deaccentuation on verification latencies for Given and New information. Language and Cognitive Processes 2, pp.145-163.
Terken, J. (1985). Communicative Functions of Pitch Accents. Some experiments . Ph.D. thesis, Leijden University.
Terken, J. (1984). The Distribution of Accents in Instructions as a Function of Discourse Structure. Language and Speech 27, pp. 269-289.
Ward, G., Hirschberg, J. (1985). Implicating uncertainty: The pragmatics of fall-rise. Language 61, pp. 747-776.

Segmental Effects

Haggard, M., Ambler, S., and M., C. Pitch as a voicing cue. Journal of Acoustical Society of America, 47:613-617.
Hombert, J.-M. Consonant types, vowel quality and tone. In Fromkin, V. A., editor, Tone: A Linguistic Survey, pages 77-111. Academic Press, New York.
Lea, W. (1973). Segmental and suprasegmental influences on fundamental frequency contours. In Hyman, L., editor, Consonant Types and Tones, pages 15-70. University of Southern California, Los Angeles.
Liberman, M., Shadle, C. H., Pierrehumbert, J. B. The intrinsic pitch of vowles in sentence context. JASA 66.
Massaro, D. W. and Cohen, M. M. (1976). The contribution of fundamental frequency and voice onset time to the /zi/-/si/ distinction. Journal of Acoustical Society of America, 60:704-717.
Silverman, K. E. (1987). The Structure and Processing of Fundamental Frequency Contours. Ph.D. thesis, University of Cambridge.
Terken, J. (1995). The perceptual relevance of micro-intonation: Enhancing the Voicing Distinction in Synthetic Speech by means of consonantal F0 perturbation. Studies in applied linguistics 2, pp. 103-124.
Umeda N. (1981). Influence of segmental factors on fundamental frequency in fluent speech. Journal of the Acoustical Society of America 70, pp. 350-355.

Prosody Markup Language

Bird, S. and M. Liberman (2000??) A Formal Framework for Linguistic Annotation. Speech Communication 33.1-2, pp. 23-60.
Kochanski, G. P. and Shih, C. (2002). Soft templates for prosody mark-up. Speech Communications. In print.
Sproat, R., Hunt, A., Ostendorf, M., Taylor, P., Black, A., and Lenzo, K. (1998). Sable: A standard for tts markup. In Proceedings of the International Conference on Spoken Language Processing, pages 1719-1724.
Taylor, P. and Isard, A. SSML: A speech synthesis markup language. Speech Communications, 21:123-133.

Representation of Prosody, Language Description

Beckman, M. E. (1996). The parsing of prosody. Language and Cognitive Processes, 11, pp. 17-67.
Beckman, M. E., Edwards, J. (1990). Lengthenings and shortenings and the nature of prosodic constituency. In J. Kingston & M.E. Beckman, (eds.) Papers in Laboratory Phonology I: Between the Grammar and the Physics of Speech , pp. 152-178. Cambridge University Press.
Beckman, M. E., Edwards, J. (1994). Articulatory evidence for differentiating stress categories. In P.A. Keating, (ed.), Phonological Structure and Phonetic Form: Papers in Laboratory Phonology III , pp. 7-33. Cambridge University Press.
Beckman, M. E., & Edwards, J. (1992). Intonational categories and the articulatory control of duration. In Y. Tohkura, E. Vatikiotis-Bateson, Y. Sagisaka, (eds.), Speech Perception, Production and Linguistic Structure , pp. 356-375. Tokyo: OHM Publishing Co.
Bellegarda, J., Silverman, K., Lenzo, K., and Anderson, V. (2001). Statistical prosodic modeling: from corpus design to parameter estimation. IEEE Transactions on Speech and Audio Processing, 9(1):52-66.
Bolinger, D. L. (1958). A theory of pitch accent in English. Word, 14:109-149.
Bolinger, D. (1986). Intonation and its parts: Melody in Spoken English. Stanford University Press.
Connell, B. A., Ladd, D. R. (1990). Aspects of Pitch Realization in Yoruba. Phonology, 7 1, 1-29.
Chen, Y., Gao, W., Zhu, T., and Ma, J. (2000). Multi-strategy data mining on Mandarin prosodic patterns. ICSLP , Beijing, China.
D'Imperio, M. (to appear). Focus and tonal structure in Nepolian Italian. Speech Communication .
D'Imperio, M., Rosenhall, S. (1999). Phonetics and phonology of main stress in Italian. Phonology 16 (1), pp. 1-27.
Edwards, J., Beckman, M. E., Fletcher, J. (1991). Articulatory kinematics of final lengthening. Journal of the Acoustical Society of America , 89. pp. 369-382.
Erickson, E., Honda, K., Hirai, H., Beckman, M. E. (1995). The production of low tones in English intonation. Journal of Phonetics, 23(1/2), pp. 179-188.
Gandour, J., Potisuk, S., Dechongkit, S. (1994). Tonal coarticulation in Thai. Journal of Phonetics, 22 (4), pp. 477-492.
Gønnum, N. (1992). The groundworks of Danish intonation: An introduction. Museum Tusculanum Press.
Gårding, E. (1987). Speech act and tonal pattern in standard Chinese: Constancy and variation. Phonetica, 44, pp. 13-29.
Gosy, M., Terken, J. (1994). Question marking in Hungarian: Timing and Height of pitch peaks. Journal of Phonetics 22, pp. 269-281.
Grabe, E., Gussenhoven, C., Haan, J., Marsi, E. C., Post, B. (1997) Preaccentual pitch and speaker attitude in Dutch. Language and Speech 41(1), pp. 63-85.
Grice, M., Ladd, D. R., Arvaniti, a. (2000). On the place of "phrase accents" in intonational phonology. Phonology 17, pp. 143-185.
Hadding-Koch, K. (1961). Acoustico-phonetic studies in the intonation of southern Swedish. Technical report, C. W. K. Gleerup, Lund, Sweden.
House, J., Dankovicová, J. Huckvale, M. (1999). Intonation modelling in Prosynth: An intergrated prosodic approach to speech synthesis. International Congress of Phonetic Sciences, San Francisco.
Jilka, M., Möhler, G., Dogil, G. (1999). Rules for the generation of ToBI-based American English intonation. Speech Communication, 28, pp.83-108.
Ni, J. F., Wang, R. H., Hirose, K. (1997). Quantitative analysis and formulation of tone concatenation in Chinese f0 contours. Eurospeech 97. Rhodes, Greece, pp. 195-198.
Ni, J. F., Kawai, G., Hirose, K. (1998). A synthesis-oriented model of phrasal pitch movements in Standard Chinese. ICSLP 98, Sydney, Australia, paper no. 750.
Prieto, P., Hirschberg, J. (1996). Training intonational phrasing rules automatically for English and Spanish Text-to-Speech. Speech Communication, 18, pp. 281-290.
Hirschberg, J., Rambow, O. (2001). Learning Prosodic Features using a Tree Representation. Proceedings of Eurospeech 2001 , Denmark.
Jun, S.-A. (forthcoming) Editor. Prosodic Models and Transcription: Towards Prosodic Typology. Oxford University Press.
Jun, S.-A. (1996). The Phonetics and Phonology of Korean Prosody: intonational phonology and prosodic structure. Garland Publishing, New York.
Jun, S.-A., Fougeron, C. (2000). A Phonological Model of French Intonation. In A. Botinis. (ed.) Intonation: Analysis, Modeling and Technology Kluwer Academic Publishers. pp.209-242.
Kochanski, G. and Shih, C. (2001). Automated modelling of Chinese intonation in continuous speech. Proceedings of Eurospeech 2001, Aalborg, Denmark. International Speech Communication Association.
Kochanski, G. P. and Shih, C. (2000). Stem-ML: Language independent prosody description. Proceedings of the 6th International Conference on Spoken Language Processing, Beijing, China.
Kochanski, G. P., Shih, C., and Jing, H. Y. (2001). Hierarchical structure and word strength prediction in Mandarin prosody. In 4th ISCA Tutorial and Research Workshop on Speech Synthesis, Perthshire, Scotland.
Laniran, Y. (1992). Intonation in a tone language: the phonetic implentation of tone in Yoruba. Ph.D. Dissertation, Cornell University.
Liberman, M. Y. (197). The intonation system of English Garland Publishing.
Liberman, M. Y. and Prince, A. (1977). On stress and linguistic rhythm. Linguistic Inquiry, 8:249-336.
Liberman, M., Schultz, J. M., Hong, S., Okeke, V. (1993). The phonetic interpretation of tone in Igbo. Phonetica 50(3), pp. 147-160.
Lieberman, P. (1967). Intonation, Perception and Language. MIT Press, Cambridge, Mass.
Möbius, B. (1993). A quantitative model of German intonation -- Analysis and synthesis of fundamental frequency contours. Ph. D. Dissertation, the University of Bonn.
Needleman, A. R. (1998). Quantification of context effects in speech perception: influence of prosody. Clinical Linguistics and Phonetics, 12(4):305-327.
Ohala, J. and Hirano, M. (1967). Studies of pitch change in speech. In UCLA Working papers on phonetics, pages 80-84.
Ohala, J. and Ladefoged, P. (1970). Further investigation of pitch regulation in speech. volume 14, pages 12-24.
Ohala, J. J. (1992). The segment, primitive or derived? In Docherty, G. J. and Ladd, D. R., editors, Papers in Laboratory Phonology II: Gesture, Segment, Prosody, pages 166-183. Cambridge University Press.
Pierrehumbert, J. (1979). The Perception of Fundamental Frequency Declination. Journal of the Acoustical Society of America 66, pp. 363-369.
Pierrehumbert, J. (1980). The Phonology and Phonetics of English Intonation. Ph.D. thesis, MIT.
Pierrehumbert, J. B. and Beckman, M. E. (1988). Japanese Tone Structure. The MIT Press.
Prevost, S., Steedman, M. (1994). Specifying intonation from context fro speech synthesis. Speech Communication 15, pp. 139-153.
Prieto, P., Nibert, H., Shih, C. (1996). The Absence or Presence of a Declination Effect on the Descent of F0 Peaks? Evidence from Mexican Spanish. In K. Zagona (ed.) Grammatical Theory and Romance Languages . John Benjamins Publishing Company.
Prieto, P. (1998) The Scaling of the L Tone Line in Spanish Downstepping Contours. Journal of Phonetics, 26, pp. 261-282.
Prieto, P., Shih, C., Nibert, H. (1996). Pitch Downtrend in Spanish. Journal of Phonetics 24(4), pp. 445-473.
Prieto, P., Hirschberg, J. (1996). Training Intonational Phrasing Rules Automatically for English and Spanish text-to-speech. Speech Communication, 18, pp. 281-290.
Shattuck-Hufnagel, S., Ostendorf, M., Ross, K. (1994). Stress shift and early pitch accent placement in lexical items in American English. Journal of Phonetics, 22, pp. 357-388.
Shattuck-Hufnagel, S., Turk, A. (1996). A prosody tutorial for investigators of auditory sentence processing. Journal of Psycholinguistic Research, V. 25, No. 2, pp. 193-247.
Shih, C. Mandarin Third Tone Sandhi and Prosodic Structure. In J. Wang and N. Smith (eds). Studies in Chinese Phonology, Mouton de Gruyter, pp. 81-123.
Shih, C. (1997). Declination in Mandarin. Proceedings in the ESCA Intonation Workshop, Athens.
Shih, C. (2000). A declination model of Mandarin Chinese. In Botinis, A., editor, Intonation: Analysis, Modelling and Technology, pages 243-268. Kluwer Academic Publishers.
Shih, C. and Kochanski, G. P. (2000). Chinese tone modeling with Stem-ML. In Proceedings of the sixth International Conference on Speech and Language Processing, Beijing, China.
Shih, C., Kochanski, G. P. (2001). Synthesis of prosodic styles. 4th ISCA Tutorial and Research Workshop on Speech Synthesis, Scotland.
Shih, C., Kochanski, G. P. (2001). Prosody control for speaking and singing styles. Eurospeech 2001 pp. 669--672 (no. 1672), Aalborg, Denmark.
Shih, C. (1986). The prosodic domain of tone sandhi in Chinese. PhD thesis, University of California, San Diego.
Shih, C. (1988). Tone and intonation in Mandarin. Working Papers of the Cornell Phonetics Laboratory, Number 3: Stress, Tone and Intonation, pages 83-109. Cornell University.
Shih, C. and Sproat, R. (1992). Variations of the Mandarin rising tone. Proceedings of the IRCS workshop on prosody in natural speech, Technical Report IRCS 92-37, pages 193-200. University of Pennsylvania, Institute for Research in Cognitive Science.
Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., and Hirschberg, J. (1992). Tobi: A standard for labeling english prosody. Proceedings of the International Conference on Spoken Language Processing, volume 2.
Speer, S., Shih, C., Slowiaczek, M. (1989). Prosodic Structure in Language Understanding: Evidence from Tone Sandhi in Mandarin. Language and Speech , 32(4), pp. 337-354.
Sproat, R. W., editor (1998). Multilingual Text-to-Speech Synthesis: The Bell Labs Approach. Kluwer Academic Publishers.
Steedman, M. (1991). Structure and intonation. Language 68, pp. 260-296.
Stevens, K. Phonetic evidence for hierarchies of features. Papers in Laboratory Phonology III, pages 242-258.
Terken, J. (1993). Synthesizing natural-sounding intonation for Dutch: rules and perceptual evaluation. Computer Speech and Language 7, pp. 27-48.
Trísková, H. (ed.) (2001). Tone, Stress and Rhythm in Spoken Chinese Special issue of Journal of Chinese Linguistics no. 17.
Turk, A. E. and Sawusch, J. R. The processing of duration and intensity cues to prominence. Journal of Acoustical Society of America, 99(6):3782-3790.
van Santen, J. P. H., Shih, C., and Möbius, B. (1998). Intonation. In R. Sproat (ed.) Multilingual Text-to-Speech Synthesis: The Bell Labs Approach, pp. 141-190. Kluwer Academic Publishers.
Venditti, J. J., Jun, S.-A., Beckman, M. E. (1996). Prosodic cues to syntactic and other linguistic structures in Japanese, Korean, and English. In J. Morgan & K. Demuth, (eds.), Signal to Syntax: Bootstrapping from Speech to Grammar in Early Acquisition , pp. 287-311. Mahwah, NJ: Lawrence Erlbaum.
Waibel, A. (1988). Prosody and speech recognition. Morgan Kaufmann Publishers, Inc., San Mateo, California.
Wang, M., Hirschberg, J. (1992). Automatic Classification of Intonational Phrase Boundaries. Computer Speech and Language 6, pp. 175-196.
Xu, Y. (1993). Contextual Tonal Variation in Mandarin Chinese. Ph.D. thesis, The University of Connecticut.
Xu, Y. (1994). Production and perception of coarticulated tones. Journal of the Acoustical Society of America 95: 2240-2253.
Xu, Y. (2001). Sources of tonal variations in connected speech. Journal of Chinese Linguistics V. 17. pp. 1-31.
Yuan, J., Shih, C., Kochanski, G. P. (2002). Comparison of Declarative and Interrogative Intonation in Chinese. In Bel, B. and Marlien, I. (eds.) Proceedings of the Speech Prosody 2002 Conference , Aix-en-Provence, Laboratoire Parole et Langage, April 2002, pp. 711-714.

[ Papers | Top | Stem-ML modeling ]

Greg Kochanski: [ Home ]
Chilin Shih: [ Home ]