Greg Kochanski

# Syllabus - Mathematical Models for Speech and Language

Hilary Term 2004. Tuesdays, 12-1, Centre for Language and Philology Common Room.

DT Week 1
• Handout1A: Why probabilities? Why one needs quantitative, probabilistic models of the world. Errors: Experimental and Other.
• Handout1B: The logic and math of Probabilities. This include definitions and basic math on probabilities (e.g. what is the probability that either of two events will happen).
• [Source] DT Week 2
• Handout2A: Case studies of pathological science. Why the unaided human brain is a dangerous tool. [Source]
• Odds Ratio form of Bayes' Theorem. Recursive use of Bayes' Theorem. Deciding between a few discrete alternatives. Connection to statistical significance. Preliminary discussion of Bayes classifiers. Applications: Language and Spam Identification. Discussion of using statistics to assign authorship. Handout3A
• Brief overview of Fuzzy Logic.
• [Source]
DT Week 4
Continuous estimation:
• Decision Rules: maximum likelyhood, maximum a-posteriori probability (MAP), minimum risk, expected value. How averages work, including linear regression. Probabilities are abstractions. Using Bayes' Theorem to estimate probabilities from frequencies. Then, you can use those probabilities in Bayes' Theorem to estimate P(Spam) or authorship or style.
• Good-Turing Estimation: how to estimate the probability that the Sun will not rise tomorrow, or other rare events. How do you compute the probability that the word ``immunological'' will appear in the next sentence, given a small corpus of text?
• [Source]
DT Week 5
• Text classifier architecture. Spam/language/authorship reprise, this time with Good-Turing and N-gram models.
• Bootstrap resampling: A easy-to-remember but computer-intensive way to look at complex problems or to handle non-Gaussian data.
• [Source]
DT Week 6
No new material - Review.
DT Week 7
• Confidence intervals and hypothesis testing.
• Monte-Carlo simulations. Q: If you flip a coin, how often do you get three heads in a row? A: Let's do a million flips on the computer and see!
• [Source]
DT Week 8
DT Unfulfilled Ambitions
• Bayesean Belief Networks.
• Information Theory (Claude Shannon). Entropy and Bits. Measuring complexity.

 [ Linguistics/Philology | Phonetics Lab | Oxford ] Last Modified Sun Jun 8 07:23:35 2008 Greg Kochanski: [ Mail | Home ]