Horne, M., (ed). (2000). Prosody: Theory and Experiment.
Studies Presented to Gösta Bruce. Kluwer Academic
Publishers, Dordrecht.
Botinis, A., (ed). (2000). Intonation: Analysis, Modelling
and Technology. Kluwer Academic Publishers, Dordrecht.
Sagisaka, Y., Campbell, W., Higuchi, N. (eds.) (1998).
Computing Prosody: Computational Models for Processing
Spontaneous Speech. Springer-Verlag, Berlin.
Hirst, D., Di Cristo, A. (eds.) (1998). Intonation Systems:
A Survey of Twenty Languages. Cambridge University Press.
Stevens, K. (1998). Acoustic Phonetics. The MIT Press,
Cambridge Mass.
Ladd, D. R. (1996). Intonational Phonology. Cambridge
University Press, Cambridge.
Intonation Models and Modeling Techniques
Anderson, M., Pierrehumbert, J., and Liberman, M. (1984).
Synthesis by rule of English intonation patterns. In Proceedings
of the International Conference on Acoustics, Speech, and Signal
Processing, volume 1, pages 2.8.1-2.8.4, San Diego, CA, USA.
ICASSP.
Black, A. W. and Hunt, A. J. (1996). Generating f0
contours from ToBI labels using linear regression. Proceedings
of ICSLP 96, Philadelphia, PA, USA.
Chen, S.-H., Hwang, S. H., and Tsai, C.-Y. (1992). A first
study of neural net based generation of prosodic and spectral
information for Mandarin text-to-speech. Proceedings of IEEE
ICASSP, volume 2, pages 45-48.
de Pijper, J. R. (1983). Modelling British English
Intonation. Foris Publications, Dordrecht, Holland.
XXX Dusterhoff, K. E., Black, A. W., and Taylor, P. Using
decision trees within the tilt intonation model to predict f0
contours. In Eurospeech.
Fujisaki, H. A note on the physiological and physical basis for
the phrase and accent components in the voice fundamental frequency
contour. In Fujimura, O., editor, Vocal Fold Physiology: Voice
Production, Mechanisms and Functions, pages 347-355. Raven, New
York.
Fujisaki, H. (1983). Dynamic characteristics of voice
fundamental frequency in speech and singing. In MacNeilage, P. F.,
editor, The Production of Speech, pages 39-55.
Springer-Verlag.
Hirst, D. J., Di Cristo, A., and Espesser, R. Levels of
representation and levels of analysis for the description of
intonation systems. In Horne, M., (ed.), Prosody: Theory and
Experiment. Studies Presented to Gösta Bruce, pages 51-87.
Kluwer Academic Publishers, Dordrecht.
Kochanski, G. P. and Shih, C. (2002). Soft templates for
prosody mark-up. Speech Communications. In print.
Levitt, H. and Rabiner, L. R. Analysis of fundamental frequency
coutours in speech. Journal of Acoustical Society of
America, 49(2):570.
Liberman, M. Y. and Pierrehumbert, J. B. (1984). Intonational
invariance under changes in pitch range and length. In Aronoff, M.
and Oehrle, R., editors, Language Sound Structure, pages
157-233. M.I.T. Press, Cambridge, Massachusetts.
Malfrère, F., Dutoit, T., and Mertens, P. (1998). Fully
automatic prosody generator for text-to-speech. In Proceedings
of the International Conference on Spoken Language Processing,
Sydney, Australis.
Öhman, S. (1967). Word and sentence intonation, a
quantitative model. Technical report, Department of Speech
Communication, Royal Institute of Technology (KTH).
Olive, J. P. Fundamental frequency rules for the synthesis of
simple declarative english sentences. Journal of Acoustical
Society of America, 57:476-482.
Pan, S., McKeown, K. Hirschberg, J. (2001). Semantic
Abnormality and its Realization in Spoken Language. Proceedings
of Eurospeech 2001 Aalborg, Denmark.
Ross, K. N. and Ostendorf, M. (1999). A dynamical system model
for generating fundamental frequency for speech synthesis. IEEE
Transactions on Speech and Audio Processing, 7(3):295-309.
Taylor, P. A. Analysis and synthesis of intonation using the
tilt model. Journal of Acoustical Society of America,
107(3):1697-1714.
Taylor, P. A. (1998). The tilt intonation model. In
Proceedings of the International Conference on Spoken Language
Processing, Sydney, Australia.
Tone and Accent Alignment
Amalia Arvaniti, A., Ladd, D. R., Mennen, I. (1998). Stability
of Tonal Alignment: the case of Greek Prenuclear Accents.
Journal of Phonetics 26: 3-25.
Ladd, D. R., Faulkner, D., Faulkner, H., Schepman, A. (1999).
Constant segmental anchoring of F0 movements under changes
in speech rate. Journal of the Acoustical Society of America
106, 1543-1554.
Ladd, D. R., Mennen, I., Schepman, A. (2000). Phonological
conditioning of peak alignment of rising pitch accents in Dutch.
Journal of the Acoustical Society of America 107,
2685-2696.
Pierrehumbert, J., Steele, S. (1990). Categories of Tonal
Alignment in English. Phonetica. pp. 181-196.
Pierrehumbert, J. (1998). Tonal elements and their alignment.
In M. Horne, (ed.) Prosody: Theory and Experiment. Studies
Presented to Gösta Bruce. Kluwer, Dordrecht.
Prieto, P., Nibert, H., Shih, C. (1995). Effects of Phrasal
Length and Time Distance between Peaks on Peak Height in Mexican
Spanish. International Conference on Spoken Language
Processing , pp. 730-733.
Prieto, P., van Santen, J., Hirschberg, J. (1994) Patterns of
F0 peak placement in Mexican Spanish. Proceedings of the Second
ESCA/IEEE Workshop on Speech Synthesis, pp. 30-34.
Silverman, K. and Pierrehumbert, J. (1990). The Timing of
Prenuclear High Accents in English. In Papers in Laboratory
Phonology I , J. Kingston and M. Beckman, (eds), Cambridge
University Press, Cambridge UK. 72-106.
van Santen, J. P. H., Möbius, B. (2000). A quantitative
model of $f0$ generation and alignment. In Botinis, A., editor,
Intonation: Analysis, Modelling and Technology, pp. 269-288.
Kluwer Academic Publishers.
van Santen, J. P. H. and Möbius, B. (1997). Modeling pitch
accent curves. In Intonation: Theory, Models, and Applications.
Proceedings of ESCA Workshop, pp. 321-324, Athens, Greece.
Xu, Y. (1998). Consistency of tone-syllable alignment across
different syllable structures and speaking rates. Phonetica
55: 179-203.
Xu, Y. (1999). Effects of tone and focus on the formation and
alignment of F0 contours. Journal of Phonetics 27:
55-105.
Xu, C. X., Xu, Y., and Luo, L. S. (1999). A pitch target
approximation model for f0 contours in Mandarin. Proceedings of
the 14th International Congress of Phonetic Sciences, pp.
2359-2362, San Francisco.
Xu, Y. and Wang, Q. E. (2001). Pitch targets and their
realization: Evidence from Mandarin Chinese. Speech
Communication 33: 319-337.
Proceedings of the ISCA Workshop on Speech and Emotion.
Northern Ireland, 2000.
Adolphs, R., Tranel, D., Damasio, H. (2002). Neural Systems for
Recognizing Emotion from Prosody. Emotion 2: 23-51.
Amir, N., Ron, S., (1998). Towards an automatic classification
of emotions in speech. Proceedings of ICSLP 98. Sydney,
Australia.
Cahn, J. E. (1989). Generating Expression in Synthesized
Speech. Master's Thesis, MIT.
Cauldwell, R. T. (2000). Where did the anger go? The role of
context in interpreting emotion in speech. ISCA Workshop on
Speech and Emotion, A conceptual framework for research.
Northern Ireland.
Cosmides, L. (1983). Invariances in the acoustic expression of
emotion during speech. Journal of Experimental Psychology: Human
Perception and Performance 9, pp. 864-881.
Cowie, R., Douglas-Cowie, E. (1996). Automatic Statistical
Analysis of the Signal and Prosodic Signs of Emotion in Speech.
Proceedings of ICSLP 96. Philadelphia.
Dellaert, f., Polzin, T., Waibel, A. (1996). Recognizing
Emotion in Speech. Proceedings of ICSLP 96. Philadelphia,
USA.
Ekman, P. (1995). The Nature of Emotion--Fundamental
Questions. Oxford University Press.
Hauser, M. (1977). Information about affective state. The
Evolution of Communication. MIT Press, pp. 476-496.
Heuft, B., Portele, T., Rauth, M. (1996). Emotions in Time
Domain Synthesis. Proceedings of ICSLP 96. Philadelphia,
USA.
Johnstone, I. T., Banse, R., Scherer, K. R. (1995) Acoustic
Profiles from Prototypical Vocal Expressions of Emotion.
Proceedings of the 13th International Congress of Phonetic
Sciences.
Maekawa, K. (1998). Phonetic and phonological characteristics
of paralinguistic information in spoken japanese. In Proceedings
of the International Conference on Spoken Language
Processing.
Montero, J.M., Gutierez-Arrilola, J., Palazuelos, S., Enriquez,
E., Aguilera S., Pardo J.M. (1998). Emotional Speech Synthesis:
From Speech Database to TTS. Proceedings of ICSLP 98.
Sydney, Australia.
Mozziconacci, S. J. L., Hermes, D. J. (1999). Role of
Intonation Patterns in Conveying Emotion in Speech. ICPhS 99
.
Mozziconacci, J. J. L. (1998). Speech Variability and
Emotion: Production and Perception. Ph.D. Thesis. Technical
University Eindhoven.
Murray, I. R., Edgington, M. D. Campion, D. Lynn, J. (2000)
Rule-based Emotion Synthesis using Concatenated Speech. ISCA
Workshop on Speech and Emotion, A conceptual framework for
research. Northern Ireland.
Murray, I. R., Arnott, J. L. (1993). Toward the Simulation of
Emotion in Synthetic Speech: A Review of the Literature on Human
Vocal Emotion. Journal of the Acoustical Society of America
93, 1097-1108.
Murray, I.R., Arnott, J. L. (1995). Implementation and testing
of a system for producing emotion-by-rule in synthetic speech.
Speech Communication 16, pp. 369-390.
Ohala, J. (1996). Ethological Theory and the Voice Expression
of Emotion in the Voice. Proceedings of ICSLP 96.
Philadelphia.
Pfeifer, R. (1988). Artificial Intelligence Models of Emotion.
Cognitive Perspectives on Emotion and Motivation. V.
Hamilton et al. (eds.) Kluwer Academic Publishers.
Picard, R. W. (1997) Affective Computing. The MIT
Press.
Rank, E., Pirker, H. (1998). Generating Emotional Speech with a
Concatenative Synthesizer. Proceedings of ICSLP 98. Sydney,
Australia.
Schroder, M. (2001). Emotional Speech Synthesis--a Review.
Proceedings of Eurospeech 2001. Aalborg. pp.561-564.
Schroder, M., Cowie, R., Douglas-Cowie, E., Westerdijk, M.,
Gielen, S. (2001). Acoustic Correlates of Emotion Dimensions in
View of Speech Synthesis. Proceedings of Eurospeech 2001 ,
pp.87-90. Aalborg, Denmark,
Stibbard, R.M. (2001). Vocal Expression of Emotions in
Non-laboratory Speech: An Investigation of the Reading/Leeds
Emotion in Speech Project Annotation Data. Ph.D. thesis.
University of Reading.
Williams, C. E., Stevens, K. N. (1972). Emotions and Speech:
Some Acoustical Factors. Journal of the Acoustical Society of
America 52, 1238-1250.
Acoustic Correlates of Stress and Accent
Beckman, M. E. (1986). Stress and Non-Stress Accent.
Netherlands Phonetic Archives No. 7). Foris. Second printing, 1992,
by Walter de Gruyter.
Beckman, M. E., Cohen, K. B. (2000). Modeling the articulatory
dynamics of two levels of stress contrast. In M. Horne, (ed.)
Prosody: Theory and Experiment. Studies Presented to Gösta
Bruce, pp. 169-200. Kluwer.
Erickson, D. (1998). Effects of contrastive emphasis on jaw
opening. Phonetica, 55:147-169.
Fry, D. B. (1955). Duration and intensity as physical
correlates of linguistic stress. Journal of Acoustical Society
of America, 30:765-769.
Fry, D. B. (1958). Experiments in the perception of stress.
Language and Speech, 1:126-152.
Gårding, E., Fujimura, O., and Hirose, H. (1970).
Laryngeal control of swedish word tones. Annual Bulletin of the
Research Institute of Logopedics and Phoniatrics,
27:135-149.
Gussenhoven, C., Repp, B.H., Rietveld, A., Rump, H. H., Terken,
J. (1997). The perceptual prominence of fundamental frequency
peaks. JASA 102, pp. 3009-3022.
Hillenbrand, J. M. and Houde, R. A. (1996). Role of f0
and amplitude in the perception of intervocalic glottal stops.
Journal of Speech and Hearing Research, 39:1182-1190.
Kehoe, M., Stoel-Gammon, C., and Buder, E. H. (1995). Acoustic
correlates of stress in young children's speech. Journal of
Speech and Hearing Research, 38:338-350.
Lieberman, P. (1960). Some acoustic correlates of word stress
in American-English. Journal of Acoustical Society of
America, 32:451-454.
Moon, S.-J. and Lindblom, B. (1990). Interaction between
duration, context, and speaking style in English stressed vowels.
Journal of Acoustical Society of America, pages 40-55.
Pollock, K. E., Brammer, D. M., and Hageman, C. F. (1990). An
acoustic analysis of young childrens productions of word stress.
Journal of Phonetics, 21:183-203.
Simada, Z. B. and Hirose, H. Physiological correlates of
Japanese accent patterns. In Annual Bulletin of the Research
Institute of Logopedics and Phoniatrics, volume 5, pages
41-49.
Sluijter, A. M. C. and van Heuven, V. J. Spectral balance as an
acoustic correlate of linguistic stress. Journal of Acoustical
Society of America, 100(4):2471-2485.
Sluijter, A. M. C., van Heuven, V. J., and Pacilly, J. J. A.
Spectral balance as a cue in the perception of linguistic stress.
Journal of Acoustical Society of America,
101(1):503-513.
Terken, J. (1991). Fundamental frequency and perceived
prominence of syllables. JASA 89, pp. 1768-1776.
Terken, J. and Hermes, D. (2000). The perception of prosodic
prominence. In M. Horne (ed.) Prosody: Theory and experiment.
Studies presented to Gösta Bruce. pp. 89-127. Kluwer
Academic Press. Dordrecht.
Physiological Explanation, Articulatory Modeling
Atkinson, J. E. (1978). Correlation analysis of the
physiological factors controlling fundamental voice frequency.
Journal of Acoustical Society of America, 63:211-222.
Browman, C. P. and Goldstein, L. (1990). Tiers in articulatory
phonology, with some implications for casual speech. In Kingston,
J. and Beckman, M., editors, Papers in Laboratory Phonology I:
Between the Grammar and Physics of Speech, pages 341-376.
Cambridge University Press.
Berry, D. A., Herzel, H., Titze, I. R., and Story, B. H.
Bifurcations in excised larynx experiments. Journal
Voice.
Herman, R., Beckman, M., Honda, K. (1999). Linguistic models of
F0 use, physiological models of F0 control, and the issue of "Mean
Response Time". Language and Speech 42, 373-399.
Herzel, H. (1995). Non-linear dynamics of voiced speech. In
Awrejcewicz, J., editor, Nonlinear Dynamics: New Theoretical and
Applied Results. Akademie Verlag.
Hollien, H. (1981). In search of vocal frequency control
mechanisms. In Bless, D. M. and Abbs, J. H., editors, Vocal Fold
Physiology: Contemporary Research and Clinical Issues, pages
361-367. College-Hill Press, San Diego, CA.
Keating, P. A. The window model of coarticulation: articulatory
evidence. Papers in Laboratory Phonology I. Between the Grammar
and Physics of Speech, pages 451-470. Cambridge University
Press.
Ladefoged, P. (1962). Subglottal activity during speech. 4th
International Congress of Phonetic Science, pages 247-265.
McFarland, D. H. and Smith, A. (1992). Effects of vocal task
and respiratory phase on prephonatory chest-wall movements.
Journal of Speech and Hearing Research, 35(5):971-982.
Lieberman, P., Knudson, R., and Mead, J. Determination of the
rate of change of f0 with respect to subglottal air pressure
during sustained phonation. Journal of Acoustical Society of
America, 45:1537-1543.
L÷fqvist, A., Baer, T., McGarr, N. S., and Story, R. S.
(1989). The cricothyroid muscle in voicing control. Journal of
Acoustical Society of America, 85:1314-1321.
Monsen, R. B., Engebretson, A. M., and Vemula, N. R. (1978).
Indirect assessment of the contribution of subglottal air pressure
and vocal fold tension to changes in the fundamental frequency in
english. Journal of Acoustical Society of America,
64(1):65-80.
Munhall, K. and L÷fqvist, A. (1992). Gestural
aggregation in speech: laryngeal gestures. Journal of
Phonetics, 20:111-126.
Perrier, P., O. D. J. L. R. The equilibrium point hypothesis
and its application to speech motor control. Journal of Speech
and Hearing Research, 39:365-378.
Pierrehumbert, J. (1997). Consequences of Intonation for the
Voice Source. In S. Kiritani, H. Hirose, and H. Fujisaki (eds.)
Speech Production and Language . Mouton de Gruyter, Berlin.
111-131.
Strik, H. (1994). Physiological control and behaviour of the
voice source in the production of prosody. Ph.D. dissertation,
University of Nijmegen.
Strik, H., Boves, L. (1992) Control of fundamental frequency,
intensity and voice quality in speech. Journal of Phonetics
20, pp. 15-25.
Strik H., Boves, L. (1992) On the relation between voice source
parameters and prosodic features in connected speech. Speech
Communication 11, pp. 167-174.
Strik, H., Boves, L. (1995) Downtrend in F0 and
Psb. Journal of Phonetics 23, pp. 203-220.
Titze, I. R. On the relation between subglottal pressure and
fundamental frequency in phonation. Journal of Acoustical
Society of America, 85(2):901-906.
Titze, I. R. The physics of small amplitude oscillation of the
vocal folds. Journal of Acoustical Society of America,
83(4):1536-1552.
Titze, I. R. (1993). Principles of Voice Production.
Prentice-Hall.
Whalen, D. H. and Kinsella-Shaw, J. M. (1997). Exploring the
relationship of inspiration duration to utterance duration.
Phonetica, 54:138-152.
Wier, C. C., Jesteadt, W., and Green, D. M. Frequency
discrimination as a function of frequency and sensation level.
Journal of Acoustical Society of America, 61:178-184.
Wilder, C. N. (1981). Chest wall preparation for phonation in
female speakers. In Bless, D. M. and Abbs, J. H., editors, Vocal
Fold Physiology: Comtemporary Research and Clinical Issues,
pages 109-123. College-Hill Press, San Diego, CA. ISBN
0-933014-87-2.
Winkworth, A. L., Davis, P. J., Adams, R. D., and Ellis, E.
(1995). Breathing patterns during spontaneous speech. Journal of
Speech and Hearing Research, 38(1):124-144.
Winkworth, A. L., Davis, P. J., Ellis, E., and Adams, R. D.
(1994). Variability and consistency in speech breathing during
reading-lung volumes, speech intensity, and linguistic factors.
Journal of Speech and Hearing Research, 37(3):535-556.
Xu, Y. and Sun X. (2002). Maximum speed of pitch change and how
it may relate to speech. Journal of the Acoustical Society of
America 111: 1399-1413.
Discourse
Bolinger, D. (1989). Intonation and its uses: Melody in
grammar and discourse. Stanford University Press.
Hirschberg, J. (1992). Some Intonational Characteristics of
Discourse Structure. ICSLP-92 .
Hirschberg, J. (1993). Pitch Accent in Context: Predicting
Intonational Prominence from Text. Artificial Intelligence,
63(1/2), pp. 305-340.
Hirschberg, J., Nakatani, C. (1996). A Prosodic Analysis of
Discourse Segments in Direction-Giving Monologues. Proceedings
of the 34th Annual Meeting of the Association for Computational
Linguistics. pp. 286-293.
Hirschberg, J. and Pierrehumbert, J. (1986). The intonational
structuring of discourse. In Proceedings of the 24th Annual
Meeting of the Association for Computational Linguistics,
volume 24, pages 136-144.
Hirschberg, J., Litman, D. (1994). Empirical Studies on the
Disambiguation of Cue Phrases. Computational Linguistics, 19
(3), pp. 501-530.
Grosz, B., Hirschberg, J. (1992). Some intonational
characteristics of discourse structure. Proceedings of ICSLP
92, V. 1. Banff, Canada. 429-432.
Nakajima S., Allen, J. F. (1993). A study on prosody and
discourse structure in cooperative dialogues. Phonetica 50,
pp. 197--210.
Nakatani, C. H. Hirschberg, J. (1994). A Corpus-based study of
repair cues in spontaneous speech. Journal of the Acoustical
Society of America, 95(3), 1603-1616.
Hirschberg, J., Avesani, C. (1997). The Role of Prosody in
Disambiguating Potentially Ambiguous Utterances in English and
Italian. ESCA Tutorial and Research Workshop on Intonation.
Athens, pp.189-192.
Pierrehumbert, J., Hirschberg, J. (1990). The Meaning of
Intonation in the Interpretation of Discourse. In P. Cohen, J.
Morgan, and M. Pollack, (eds.) Intentions in Communication.
MIT Press, Cambridge MA. 271-311.
Swerts, M. Geluykens, R. (1993). The prosody of information
units in spontaneous monologue. Phonetica 50, pp.
189-196.
Swerts, M., Geluykens, R. (1994). Prosody as a marker of
information flow in spoken discourse. Language and Speech
37(1), pp. 21-43.
Swerts, M. Hirschberg, J. (eds) (1999). Prosody and
conversation. Special double issue of Language and Speech on
Prosody and Conversation , 41:3/4.
Swerts, M. (1997). Prosodic features at discourse boundaries of
different strength. Journal of the Acoustical Society of
America 101 (1), pp. 514-521.
Swerts, M., Ostendorf, M. (1997). Prosodic and lexical
indications of discourse structure in human-machine interactions.
Speech Communication 22, pp. 25-41.
Terken, J., Hirschberg, J. (1994). Deaccentuation of words
representing 'Given' information: Effects of persistence of
grammatical role and surface position. Language and Speech
37, pp. 125-145.
Terken, J. and Nooteboom, S.G. (1987). Opposite effects of
accentuation and deaccentuation on verification latencies for Given
and New information. Language and Cognitive Processes 2,
pp.145-163.
Terken, J. (1985). Communicative Functions of Pitch Accents.
Some experiments . Ph.D. thesis, Leijden University.
Terken, J. (1984). The Distribution of Accents in Instructions
as a Function of Discourse Structure. Language and Speech
27, pp. 269-289.
Ward, G., Hirschberg, J. (1985). Implicating uncertainty: The
pragmatics of fall-rise. Language 61, pp. 747-776.
Segmental Effects
Haggard, M., Ambler, S., and M., C. Pitch as a voicing cue.
Journal of Acoustical Society of America, 47:613-617.
Hombert, J.-M. Consonant types, vowel quality and tone. In
Fromkin, V. A., editor, Tone: A Linguistic Survey, pages
77-111. Academic Press, New York.
Lea, W. (1973). Segmental and suprasegmental influences on
fundamental frequency contours. In Hyman, L., editor, Consonant
Types and Tones, pages 15-70. University of Southern
California, Los Angeles.
Liberman, M., Shadle, C. H., Pierrehumbert, J. B. The intrinsic
pitch of vowles in sentence context. JASA 66.
Massaro, D. W. and Cohen, M. M. (1976). The contribution of
fundamental frequency and voice onset time to the /zi/-/si/
distinction. Journal of Acoustical Society of America,
60:704-717.
Silverman, K. E. (1987). The Structure and Processing of
Fundamental Frequency Contours. Ph.D. thesis, University of
Cambridge.
Terken, J. (1995). The perceptual relevance of
micro-intonation: Enhancing the Voicing Distinction in Synthetic
Speech by means of consonantal F0 perturbation. Studies in
applied linguistics 2, pp. 103-124.
Umeda N. (1981). Influence of segmental factors on fundamental
frequency in fluent speech. Journal of the Acoustical Society of
America 70, pp. 350-355.
Prosody Markup Language
Bird, S. and M. Liberman (2000??) A Formal Framework for
Linguistic Annotation. Speech Communication 33.1-2, pp.
23-60.
Kochanski, G. P. and Shih, C. (2002). Soft templates for
prosody mark-up. Speech Communications. In print.
Sproat, R., Hunt, A., Ostendorf, M., Taylor, P., Black, A., and
Lenzo, K. (1998). Sable: A standard for tts markup. In
Proceedings of the International Conference on Spoken Language
Processing, pages 1719-1724.
Taylor, P. and Isard, A. SSML: A speech synthesis markup
language. Speech Communications, 21:123-133.
Representation of Prosody, Language Description
Beckman, M. E. (1996). The parsing of prosody. Language and
Cognitive Processes, 11, pp. 17-67.
Beckman, M. E., Edwards, J. (1990). Lengthenings and
shortenings and the nature of prosodic constituency. In J. Kingston
& M.E. Beckman, (eds.) Papers in Laboratory Phonology I:
Between the Grammar and the Physics of Speech , pp. 152-178.
Cambridge University Press.
Beckman, M. E., Edwards, J. (1994). Articulatory evidence for
differentiating stress categories. In P.A. Keating, (ed.),
Phonological Structure and Phonetic Form: Papers in Laboratory
Phonology III , pp. 7-33. Cambridge University Press.
Beckman, M. E., & Edwards, J. (1992). Intonational
categories and the articulatory control of duration. In Y. Tohkura,
E. Vatikiotis-Bateson, Y. Sagisaka, (eds.), Speech Perception,
Production and Linguistic Structure , pp. 356-375. Tokyo: OHM
Publishing Co.
Bellegarda, J., Silverman, K., Lenzo, K., and Anderson, V.
(2001). Statistical prosodic modeling: from corpus design to
parameter estimation. IEEE Transactions on Speech and Audio
Processing, 9(1):52-66.
Bolinger, D. L. (1958). A theory of pitch accent in English.
Word, 14:109-149.
Bolinger, D. (1986). Intonation and its parts: Melody in
Spoken English. Stanford University Press.
Connell, B. A., Ladd, D. R. (1990). Aspects of Pitch
Realization in Yoruba. Phonology, 7 1, 1-29.
Chen, Y., Gao, W., Zhu, T., and Ma, J. (2000). Multi-strategy
data mining on Mandarin prosodic patterns. ICSLP , Beijing,
China.
D'Imperio, M. (to appear). Focus and tonal structure in
Nepolian Italian. Speech Communication .
D'Imperio, M., Rosenhall, S. (1999). Phonetics and phonology of
main stress in Italian. Phonology 16 (1), pp. 1-27.
Edwards, J., Beckman, M. E., Fletcher, J. (1991). Articulatory
kinematics of final lengthening. Journal of the Acoustical
Society of America , 89. pp. 369-382.
Erickson, E., Honda, K., Hirai, H., Beckman, M. E. (1995). The
production of low tones in English intonation. Journal of
Phonetics, 23(1/2), pp. 179-188.
Gandour, J., Potisuk, S., Dechongkit, S. (1994). Tonal
coarticulation in Thai. Journal of Phonetics, 22 (4), pp.
477-492.
Gønnum, N. (1992). The groundworks of Danish intonation:
An introduction. Museum Tusculanum Press.
Gårding, E. (1987). Speech act and tonal pattern in
standard Chinese: Constancy and variation. Phonetica, 44,
pp. 13-29.
Gosy, M., Terken, J. (1994). Question marking in Hungarian:
Timing and Height of pitch peaks. Journal of Phonetics 22,
pp. 269-281.
Grabe, E., Gussenhoven, C., Haan, J., Marsi, E. C., Post, B.
(1997) Preaccentual pitch and speaker attitude in Dutch.
Language and Speech 41(1), pp. 63-85.
Grice, M., Ladd, D. R., Arvaniti, a. (2000). On the place of
"phrase accents" in intonational phonology. Phonology 17,
pp. 143-185.
Hadding-Koch, K. (1961). Acoustico-phonetic studies in the
intonation of southern Swedish. Technical report, C. W. K. Gleerup,
Lund, Sweden.
House, J., Dankovicová, J. Huckvale, M. (1999).
Intonation modelling in Prosynth: An intergrated prosodic approach
to speech synthesis. International Congress of Phonetic Sciences,
San Francisco.
Jilka, M., Möhler, G., Dogil, G. (1999). Rules for the
generation of ToBI-based American English intonation. Speech
Communication, 28, pp.83-108.
Ni, J. F., Wang, R. H., Hirose, K. (1997). Quantitative
analysis and formulation of tone concatenation in Chinese f0
contours. Eurospeech 97. Rhodes, Greece, pp. 195-198.
Ni, J. F., Kawai, G., Hirose, K. (1998). A synthesis-oriented
model of phrasal pitch movements in Standard Chinese. ICSLP 98,
Sydney, Australia, paper no. 750.
Prieto, P., Hirschberg, J. (1996). Training intonational
phrasing rules automatically for English and Spanish
Text-to-Speech. Speech Communication, 18, pp. 281-290.
Hirschberg, J., Rambow, O. (2001). Learning Prosodic Features
using a Tree Representation. Proceedings of Eurospeech 2001
, Denmark.
Jun, S.-A. (forthcoming) Editor. Prosodic Models and
Transcription: Towards Prosodic Typology. Oxford University
Press.
Jun, S.-A. (1996). The Phonetics and Phonology of Korean
Prosody: intonational phonology and prosodic structure. Garland
Publishing, New York.
Jun, S.-A., Fougeron, C. (2000). A Phonological Model of French
Intonation. In A. Botinis. (ed.) Intonation: Analysis, Modeling
and Technology Kluwer Academic Publishers. pp.209-242.
Kochanski, G. and Shih, C. (2001). Automated modelling of
Chinese intonation in continuous speech. Proceedings of
Eurospeech 2001, Aalborg, Denmark. International Speech
Communication Association.
Kochanski, G. P. and Shih, C. (2000). Stem-ML: Language
independent prosody description. Proceedings of the 6th
International Conference on Spoken Language Processing,
Beijing, China.
Kochanski, G. P., Shih, C., and Jing, H. Y. (2001).
Hierarchical structure and word strength prediction in Mandarin
prosody. In 4th ISCA Tutorial and Research Workshop on Speech
Synthesis, Perthshire, Scotland.
Laniran, Y. (1992). Intonation in a tone language: the
phonetic implentation of tone in Yoruba. Ph.D. Dissertation,
Cornell University.
Liberman, M. Y. (197). The intonation system of English
Garland Publishing.
Liberman, M. Y. and Prince, A. (1977). On stress and linguistic
rhythm. Linguistic Inquiry, 8:249-336.
Liberman, M., Schultz, J. M., Hong, S., Okeke, V. (1993). The
phonetic interpretation of tone in Igbo. Phonetica 50(3),
pp. 147-160.
Lieberman, P. (1967). Intonation, Perception and
Language. MIT Press, Cambridge, Mass.
Möbius, B. (1993). A quantitative model of German
intonation -- Analysis and synthesis of fundamental frequency
contours. Ph. D. Dissertation, the University of Bonn.
Needleman, A. R. (1998). Quantification of context effects in
speech perception: influence of prosody. Clinical Linguistics
and Phonetics, 12(4):305-327.
Ohala, J. and Hirano, M. (1967). Studies of pitch change in
speech. In UCLA Working papers on phonetics, pages
80-84.
Ohala, J. and Ladefoged, P. (1970). Further investigation of
pitch regulation in speech. volume 14, pages 12-24.
Ohala, J. J. (1992). The segment, primitive or derived? In
Docherty, G. J. and Ladd, D. R., editors, Papers in Laboratory
Phonology II: Gesture, Segment, Prosody, pages 166-183.
Cambridge University Press.
Pierrehumbert, J. (1979). The Perception of Fundamental
Frequency Declination. Journal of the Acoustical Society of
America 66, pp. 363-369.
Pierrehumbert, J. (1980). The Phonology and Phonetics of
English Intonation. Ph.D. thesis, MIT.
Pierrehumbert, J. B. and Beckman, M. E. (1988). Japanese
Tone Structure. The MIT Press.
Prevost, S., Steedman, M. (1994). Specifying intonation from
context fro speech synthesis. Speech Communication 15, pp.
139-153.
Prieto, P., Nibert, H., Shih, C. (1996). The Absence or
Presence of a Declination Effect on the Descent of F0 Peaks?
Evidence from Mexican Spanish. In K. Zagona (ed.) Grammatical
Theory and Romance Languages . John Benjamins Publishing
Company.
Prieto, P. (1998) The Scaling of the L Tone Line in Spanish
Downstepping Contours. Journal of Phonetics, 26, pp.
261-282.
Prieto, P., Shih, C., Nibert, H. (1996). Pitch Downtrend in
Spanish. Journal of Phonetics 24(4), pp. 445-473.
Prieto, P., Hirschberg, J. (1996). Training Intonational
Phrasing Rules Automatically for English and Spanish
text-to-speech. Speech Communication, 18, pp. 281-290.
Shattuck-Hufnagel, S., Ostendorf, M., Ross, K. (1994). Stress
shift and early pitch accent placement in lexical items in American
English. Journal of Phonetics, 22, pp. 357-388.
Shattuck-Hufnagel, S., Turk, A. (1996). A prosody tutorial for
investigators of auditory sentence processing. Journal of
Psycholinguistic Research, V. 25, No. 2, pp. 193-247.
Shih, C. Mandarin Third Tone Sandhi and Prosodic Structure. In
J. Wang and N. Smith (eds). Studies in Chinese Phonology,
Mouton de Gruyter, pp. 81-123.
Shih, C. (1997). Declination in Mandarin. Proceedings in the
ESCA Intonation Workshop, Athens.
Shih, C. (2000). A declination model of Mandarin Chinese. In
Botinis, A., editor, Intonation: Analysis, Modelling and
Technology, pages 243-268. Kluwer Academic Publishers.
Shih, C. and Kochanski, G. P. (2000). Chinese tone modeling
with Stem-ML. In Proceedings of the sixth International
Conference on Speech and Language Processing, Beijing,
China.
Shih, C., Kochanski, G. P. (2001). Synthesis of prosodic
styles. 4th ISCA Tutorial and Research Workshop on Speech
Synthesis, Scotland.
Shih, C., Kochanski, G. P. (2001). Prosody control for speaking
and singing styles. Eurospeech 2001 pp. 669--672 (no. 1672),
Aalborg, Denmark.
Shih, C. (1986). The prosodic domain of tone sandhi in
Chinese. PhD thesis, University of California, San Diego.
Shih, C. (1988). Tone and intonation in Mandarin. Working
Papers of the Cornell Phonetics Laboratory, Number 3: Stress, Tone
and Intonation, pages 83-109. Cornell University.
Shih, C. and Sproat, R. (1992). Variations of the Mandarin
rising tone. Proceedings of the IRCS workshop on prosody in
natural speech, Technical Report IRCS 92-37, pages 193-200.
University of Pennsylvania, Institute for Research in Cognitive
Science.
Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M.,
Wightman, C., Price, P., Pierrehumbert, J., and Hirschberg, J.
(1992). Tobi: A standard for labeling english prosody.
Proceedings of the International Conference on Spoken Language
Processing, volume 2.
Speer, S., Shih, C., Slowiaczek, M. (1989). Prosodic Structure
in Language Understanding: Evidence from Tone Sandhi in Mandarin.
Language and Speech , 32(4), pp. 337-354.
Sproat, R. W., editor (1998). Multilingual Text-to-Speech
Synthesis: The Bell Labs Approach. Kluwer Academic
Publishers.
Steedman, M. (1991). Structure and intonation. Language
68, pp. 260-296.
Stevens, K. Phonetic evidence for hierarchies of features.
Papers in Laboratory Phonology III, pages 242-258.
Terken, J. (1993). Synthesizing natural-sounding intonation for
Dutch: rules and perceptual evaluation. Computer Speech and
Language 7, pp. 27-48.
Trísková, H. (ed.) (2001). Tone, Stress and
Rhythm in Spoken Chinese Special issue of Journal of Chinese
Linguistics no. 17.
Turk, A. E. and Sawusch, J. R. The processing of duration and
intensity cues to prominence. Journal of Acoustical Society of
America, 99(6):3782-3790.
van Santen, J. P. H., Shih, C., and Möbius, B. (1998).
Intonation. In R. Sproat (ed.) Multilingual Text-to-Speech
Synthesis: The Bell Labs Approach, pp. 141-190. Kluwer Academic
Publishers.
Venditti, J. J., Jun, S.-A., Beckman, M. E. (1996). Prosodic
cues to syntactic and other linguistic structures in Japanese,
Korean, and English. In J. Morgan & K. Demuth, (eds.),
Signal to Syntax: Bootstrapping from Speech to Grammar in Early
Acquisition , pp. 287-311. Mahwah, NJ: Lawrence Erlbaum.
Waibel, A. (1988). Prosody and speech recognition.
Morgan Kaufmann Publishers, Inc., San Mateo, California.
Wang, M., Hirschberg, J. (1992). Automatic Classification of
Intonational Phrase Boundaries. Computer Speech and Language
6, pp. 175-196.
Xu, Y. (1993). Contextual Tonal Variation in Mandarin
Chinese. Ph.D. thesis, The University of Connecticut.
Xu, Y. (1994). Production and perception of coarticulated
tones. Journal of the Acoustical Society of America 95:
2240-2253.
Xu, Y. (2001). Sources of tonal variations in connected speech.
Journal of Chinese Linguistics V. 17. pp. 1-31.
Yuan, J., Shih, C., Kochanski, G. P. (2002). Comparison of
Declarative and Interrogative Intonation in Chinese. In Bel, B. and
Marlien, I. (eds.) Proceedings of the Speech Prosody 2002
Conference , Aix-en-Provence, Laboratoire Parole et Langage,
April 2002, pp. 711-714.