Trees | Indices | Help |
|
---|
|
A linear or quadratic classifier, but the test and training sets are
defined by the -group
flag. The idea is that you can define
groups and a group is treated as a unit when the data is split into the
test and training set.
Why would you want to do this? For instance, if you are building classifiers to separate languages, and you have several subjects per language, it is possible that your subject is learning peculiarities of the individual subjects, rather than properties of the language.
Without grouping, half of subject A's data might be in the training set and half in the test set. The classifier may learn A's properties and then use that knowledge in the test. Thus, without grouping, the classifier can succeed without generalizing from subject to subject.
With grouping, the classifier needs to learn the language from (e.g.) subjects A, B, C and then extrapolate that knowledge onto other subjects from the same language (e.g. D, E, F). Thus, the classifier is forced to learn general properties of the language, not specific properties of the individual.
To use this, give the -group PATTERN NUM
switch. PATTERN
is a regular expression
, possibly including
parenthesized regions (re.match.group
), and NUM
is an integer that selects which region is used as the group name.
NUM=0 means the entire regular expression, NUM=1 means the first
parenthesized region, etc.
See the usage notes and flags for l_classifier
.
Variables | |
__package__ = None
|
Imports: die, fiatio, g_closure, Q, QC, LC
Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0.1 on Wed Dec 8 11:00:41 2010 | http://epydoc.sourceforge.net |