Syllabus - Mathematical Models for Speech and Language
Hilary Term 2004. Tuesdays, 12-1, Centre for Language and
Philology Common Room.
DT Week 1
Handout1A: Why probabilities? Why one needs
quantitative, probabilistic models of the world. Errors:
Experimental and Other.
Handout1B:
The logic and math of Probabilities. This include definitions
and basic math on probabilities (e.g. what is the probability
that either of two events will happen).
Odds Ratio form of Bayes' Theorem. Recursive use of Bayes'
Theorem. Deciding between a few discrete alternatives.
Connection to statistical significance. Preliminary discussion
of Bayes classifiers. Applications: Language and Spam
Identification. Discussion of using statistics to assign
authorship. Handout3A
Decision Rules: maximum likelyhood, maximum a-posteriori
probability (MAP), minimum risk, expected value. How averages
work, including linear regression. Probabilities are
abstractions. Using Bayes' Theorem to estimate probabilities
from frequencies. Then, you can use those probabilities in
Bayes' Theorem to estimate P(Spam) or authorship or style.
Good-Turing
Estimation: how to estimate the probability that the Sun will
not rise tomorrow, or other rare events. How do you compute the
probability that the word ``immunological'' will appear in the
next sentence, given a small corpus of text?