H1 Syllabus -- Modeling of Language Phenomena

Instructors: Chilin Shih and Greg P. Kochanski | May 20 - August 8, 2002, 6:00 - 9:25 P.M., Wednesdays |

The goal of this course is to apply statistical and mathematical and physiological concepts to understand and to provide a quantitative description of language. We emphasize hands-on experience using on-line databases, testing and verifying hypotheses, and acquiring basic computational skills. Students will use their knowledge to build a software model of phoneme duration, and also to identify which language a document is in.

There is one required textbook: Cartoon Guide: *The Cartoon
Guide to Statistics,* by Larry Gonick and Woollcott Smith.
HarperPerennial, 1993, New York. ISBN 0-06-273102-5.

Conditional probabilities, Bayes' Theorem, MAP (Maximum A-posteriori Estimation), Good-Turing estimation.

H3 Statistical concepts:Frequency, ratio, rank, mean, median, standard deviation, Zipf's law, graphical display, N-grams, CART, multiple linear regression.

H3 Linguistics concepts:Linguistic units: phones, syllables, words, phone per second, documents, linguistic distribution, durations, document language.

H3 Physiological Concepts:Articulator motions, articulatory definition of phonemes, springs, masses, and accelerations, muscle physiology, control strategies.

H3 Applications:Language identification, author identification, modeling of speech segment durations.

H3 Computation skills:Unix file management, pipe, tr, sort, uniq, basic programming: sh, awk.

