
Greg
Kochanski 

Syllabus  Statistics for Linguists
Hilary Term 2006.
Chapters refer to "The Cartoon Guide to Statistics," by Larry
Gonick and Woollcott Smith, HarperCollins Publishers, 1993. ISBN
0062731025. It uses the open source "R" package for
demonstrations, which can be obtained at http://www.rproject.org . A good
tutorial can be found at http://www.math.ilstu.edu/dhkim/Rstuff/Rtutor.html
. Other tutorials are at http://personalityproject.org/r/r.short.html
and a more advanced one at http://www.psych.upenn.edu/ baron/rpsych/rpsych.html
.
 Week 1

Scientific bloopers and summary statistics. Histograms and
graphical displays of data. Measures of position (mean and
median) and measures of spread (standard deviation and
interquartile range). Read Gonick and Smith chapters 1, 2.
 Week 2

Probability and Bayes' Theorem. Read Gonick and Smith chapter
3.
 Week 3

Histograms and Counting Things. Sampling errors. Read Gonick
and Smith chapter 3, 4, and the Binomial part of Chapter 5.
 Counting things. What's the difference between a
frequency and a probability? Models vs. samples. (Random
Variables.)
 Handout2B:
Bayes' Theorem. [Source]
 Data
entry in R and some plotting. [Source]
 Looking for correlations between two measurements. Live
computer examples of scatter plots in R (and maybe
SPSS).
 Week 4

Real techniques for picking samples of people. Hypothesis
testing z and ttests. Read Gonick and Smith chapters 5, 6.
 Counting things and statistical
sampling. Opportunity sampling, stratified sampling,
random sampling. [Source]
 z and ttests.
 Z and ttests
in R (software demonstration). [Source]
 Week 5

More on hypothesis testing. Error bars and confidence
intervals. Read Gonick and Smith chapter 7 and 8.
 Week 6
 Experimental design and twosample ttests. Read Gonick and
Smith chapters 9, 10, 11 ANOVA (Analysis of Variance) and
linear regression.
 Week 7

Text classifiers.

GoodTuring Estimation: how to estimate the probability
that the Sun will not rise tomorrow, or other rare events.
How do you compute the probability that the word
``immunological'' will appear in the next sentence, given a
small corpus of text? [Source]
 Beginnings of Bayesean
text classifiers. [Source]
 Architecture and practical considerations in text
classifiers. [Source]
 Week 8

Modern techniques. Read Gonick and Smith chapter 12.
 Robust statistics.

MonteCarlo simulations. MonteCarlo techniques allow
you to simply test complicated and messy models. Question:
If you flip a coin, how often do you get three heads in a
row? Answer Let's do a million flips on the computer and
see! [Source]
 Bootstrap
resampling: An easytoremember but computerintensive
way to look at complex problems or to handle nonGaussian
data. [Source]
 MarkovChain Monte Carlo example: One lump or two?