Greg Kochanski

# Syllabus - Statistics for Linguists

Hilary Term 2006.

Chapters refer to "The Cartoon Guide to Statistics," by Larry Gonick and Woollcott Smith, HarperCollins Publishers, 1993. ISBN 0-06-273102-5. It uses the open source "R" package for demonstrations, which can be obtained at http://www.r-project.org . A good tutorial can be found at http://www.math.ilstu.edu/dhkim/Rstuff/Rtutor.html . Other tutorials are at http://personality-project.org/r/r.short.html and a more advanced one at http://www.psych.upenn.edu/ baron/rpsych/rpsych.html .

Week 1
Scientific bloopers and summary statistics. Histograms and graphical displays of data. Measures of position (mean and median) and measures of spread (standard deviation and inter-quartile range). Read Gonick and Smith chapters 1, 2.
Week 2
Probability and Bayes' Theorem. Read Gonick and Smith chapter 3.
Week 3
Histograms and Counting Things. Sampling errors. Read Gonick and Smith chapter 3, 4, and the Binomial part of Chapter 5.
• Counting things. What's the difference between a frequency and a probability? Models vs. samples. (Random Variables.)
• Handout2B: Bayes' Theorem. [Source]
• Data entry in R and some plotting. [Source]
• Looking for correlations between two measurements. Live computer examples of scatter plots in R (and maybe SPSS).
Week 4
Real techniques for picking samples of people. Hypothesis testing z- and t-tests. Read Gonick and Smith chapters 5, 6.
• Counting things and statistical sampling. Opportunity sampling, stratified sampling, random sampling. [Source]
• z- and t-tests.
• Z- and t-tests in R (software demonstration). [Source]
Week 5
More on hypothesis testing. Error bars and confidence intervals. Read Gonick and Smith chapter 7 and 8.
Week 6
Experimental design and two-sample t-tests. Read Gonick and Smith chapters 9, 10, 11 ANOVA (Analysis of Variance) and linear regression.
Week 7
Text classifiers.
• Good-Turing Estimation: how to estimate the probability that the Sun will not rise tomorrow, or other rare events. How do you compute the probability that the word ``immunological'' will appear in the next sentence, given a small corpus of text? [Source]
• Beginnings of Bayesean text classifiers. [Source]
• Architecture and practical considerations in text classifiers. [Source]
Week 8
Modern techniques. Read Gonick and Smith chapter 12.
• Robust statistics.
• Monte-Carlo simulations. Monte-Carlo techniques allow you to simply test complicated and messy models. Question: If you flip a coin, how often do you get three heads in a row? Answer Let's do a million flips on the computer and see! [Source]
• Bootstrap resampling: An easy-to-remember but computer-intensive way to look at complex problems or to handle non-Gaussian data. [Source]
• Markov-Chain Monte Carlo example: One lump or two?

 [ Linguistics/Philology | Phonetics Lab | Oxford ] Last Modified Thu Oct 22 15:05:01 2009 Greg Kochanski: [ Home ]