Script qdg_classifier

A linear or quadratic classifier, but the test and training sets are defined by the -group flag. The idea is that you can define groups and a group is treated as a unit when the data is split into the test and training set.

Why would you want to do this? For instance, if you are building classifiers to separate languages, and you have several subjects per language, it is possible that your subject is learning peculiarities of the individual subjects, rather than properties of the language.

Without grouping, half of subject A's data might be in the training set and half in the test set. The classifier may learn A's properties and then use that knowledge in the test. Thus, without grouping, the classifier can succeed without generalizing from subject to subject.

With grouping, the classifier needs to learn the language from (e.g.) subjects A, B, C and then extrapolate that knowledge onto other subjects from the same language (e.g. D, E, F). Thus, the classifier is forced to learn general properties of the language, not specific properties of the individual.

To use this, give the -group PATTERN NUM switch. PATTERN is a regular expression, possibly including parenthesized regions (re.match.group), and NUM is an integer that selects which region is used as the group name. NUM=0 means the entire regular expression, NUM=1 means the first parenthesized region, etc.

See the usage notes and flags for l_classifier.

Notes:

Classifying 2300 entries into 5 groups, based on a 3-dimensional feature vector takes about 40 minutes (in 2010, on a single processor).
This code was described in an appendix to "Dimensions of durational variation in speech", by Anastassia Loukina, Greg Kochanski, Burton Rosner, Chilin Shih, and Elinor Keane, submitted 2010 to J. Acoustical Society of America.

Functions

go_group_q(fd, n_per_dim=10, ftrim=None, grouper=None, coverage=3, ftest=0.25, verbose=True, modify_class=None)

go_group_l(fd, n_per_dim=10, ftrim=None, grouper=None, coverage=3, ftest=0.25, verbose=True, modify_class=None)

Variables
	__package__ = `None`

Imports: die, fiatio, g_closure, Q, QC, LC