Greg Kochanski

Syllabus - Statistics for Linguists

Hilary Term 2006.

Chapters refer to "The Cartoon Guide to Statistics," by Larry Gonick and Woollcott Smith, HarperCollins Publishers, 1993. ISBN 0-06-273102-5. It uses the open source "R" package for demonstrations, which can be obtained at http://www.r-project.org . A good tutorial can be found at http://www.math.ilstu.edu/dhkim/Rstuff/Rtutor.html . Other tutorials are at http://personality-project.org/r/r.short.html and a more advanced one at http://www.psych.upenn.edu/ baron/rpsych/rpsych.html .

Week 1

Scientific bloopers and summary statistics. Histograms and graphical displays of data. Measures of position (mean and median) and measures of spread (standard deviation and inter-quartile range). Read Gonick and Smith chapters 1, 2.

Case studies of pathological science and systematic errors. [Source]
Summary statistics: means, medians, standard deviations, et cetera.

Week 2

Probability and Bayes' Theorem. Read Gonick and Smith chapter 3.

Why use probabilities and statistics? [Source]
A Condensed Summary of Probability. [Source]
Live computer examples of Loading R (and maybe SPSS), entering data, computing summary statistics and histograms. Reading data and basic processing in R.

Week 3

Histograms and Counting Things. Sampling errors. Read Gonick and Smith chapter 3, 4, and the Binomial part of Chapter 5.

Counting things. What's the difference between a frequency and a probability? Models vs. samples. (Random Variables.)
Handout2B: Bayes' Theorem. [Source]
Data entry in R and some plotting. [Source]
Looking for correlations between two measurements. Live computer examples of scatter plots in R (and maybe SPSS).

Week 4

Real techniques for picking samples of people. Hypothesis testing z- and t-tests. Read Gonick and Smith chapters 5, 6.

Counting things and statistical sampling. Opportunity sampling, stratified sampling, random sampling. [Source]
z- and t-tests.
Z- and t-tests in R (software demonstration). [Source]

Week 5

More on hypothesis testing. Error bars and confidence intervals. Read Gonick and Smith chapter 7 and 8.

Connecting hypothesis testing with confidence intervals and random variables. Hypothesis testing via Bayes Theorem. [Source]
Odds-ratio form of Bayes Theorem, practical problems and considerations in applying Bayes' Theorem. [Source]
Hypothesis testing by eye. [Source]

Week 6: Experimental design and two-sample t-tests. Read Gonick and Smith chapters 9, 10, 11 ANOVA (Analysis of Variance) and linear regression.

Week 7

Text classifiers.

Good-Turing Estimation: how to estimate the probability that the Sun will not rise tomorrow, or other rare events. How do you compute the probability that the word ``immunological'' will appear in the next sentence, given a small corpus of text? [Source]
Beginnings of Bayesean text classifiers. [Source]
Architecture and practical considerations in text classifiers. [Source]

Week 8

Modern techniques. Read Gonick and Smith chapter 12.

Robust statistics.
Monte-Carlo simulations. Monte-Carlo techniques allow you to simply test complicated and messy models. Question: If you flip a coin, how often do you get three heads in a row? Answer Let's do a million flips on the computer and see! [Source]
Bootstrap resampling: An easy-to-remember but computer-intensive way to look at complex problems or to handle non-Gaussian data. [Source]
Markov-Chain Monte Carlo example: One lump or two?

[ Linguistics/Philology | Phonetics Lab | Oxford ]

Last Modified Thu Oct 22 15:05:01 2009

Greg Kochanski: [ Home ]