|
Greg
Kochanski |
|
Syllabus - Statistics for Linguists
Hilary Term 2006.
Chapters refer to "The Cartoon Guide to Statistics," by Larry
Gonick and Woollcott Smith, HarperCollins Publishers, 1993. ISBN
0-06-273102-5. It uses the open source "R" package for
demonstrations, which can be obtained at http://www.r-project.org . A good
tutorial can be found at http://www.math.ilstu.edu/dhkim/Rstuff/Rtutor.html
. Other tutorials are at http://personality-project.org/r/r.short.html
and a more advanced one at http://www.psych.upenn.edu/ baron/rpsych/rpsych.html
.
- Week 1
-
Scientific bloopers and summary statistics. Histograms and
graphical displays of data. Measures of position (mean and
median) and measures of spread (standard deviation and
inter-quartile range). Read Gonick and Smith chapters 1, 2.
- Week 2
-
Probability and Bayes' Theorem. Read Gonick and Smith chapter
3.
- Week 3
-
Histograms and Counting Things. Sampling errors. Read Gonick
and Smith chapter 3, 4, and the Binomial part of Chapter 5.
- Counting things. What's the difference between a
frequency and a probability? Models vs. samples. (Random
Variables.)
- Handout2B:
Bayes' Theorem. [Source]
- Data
entry in R and some plotting. [Source]
- Looking for correlations between two measurements. Live
computer examples of scatter plots in R (and maybe
SPSS).
- Week 4
-
Real techniques for picking samples of people. Hypothesis
testing z- and t-tests. Read Gonick and Smith chapters 5, 6.
- Counting things and statistical
sampling. Opportunity sampling, stratified sampling,
random sampling. [Source]
- z- and t-tests.
- Z- and t-tests
in R (software demonstration). [Source]
- Week 5
-
More on hypothesis testing. Error bars and confidence
intervals. Read Gonick and Smith chapter 7 and 8.
- Week 6
- Experimental design and two-sample t-tests. Read Gonick and
Smith chapters 9, 10, 11 ANOVA (Analysis of Variance) and
linear regression.
- Week 7
-
Text classifiers.
-
Good-Turing Estimation: how to estimate the probability
that the Sun will not rise tomorrow, or other rare events.
How do you compute the probability that the word
``immunological'' will appear in the next sentence, given a
small corpus of text? [Source]
- Beginnings of Bayesean
text classifiers. [Source]
- Architecture and practical considerations in text
classifiers. [Source]
- Week 8
-
Modern techniques. Read Gonick and Smith chapter 12.
- Robust statistics.
-
Monte-Carlo simulations. Monte-Carlo techniques allow
you to simply test complicated and messy models. Question:
If you flip a coin, how often do you get three heads in a
row? Answer Let's do a million flips on the computer and
see! [Source]
- Bootstrap
resampling: An easy-to-remember but computer-intensive
way to look at complex problems or to handle non-Gaussian
data. [Source]
- Markov-Chain Monte Carlo example: One lump or two?