Package classifiers :: Module q_classifier_r

Module q_classifier_r

This is a support module, used by many types of classifiers.

Classes
	datum_c This is an unclassified datum, either in the test or training set.
	datum_tr This is a datum where we know the true class, presumably in the training set.
	grouper_c A 'grouper' function takes a DUID (a unique i.d.
	model_template This class describes how to compute the relative probability that a datum is a member of a particular class.
	qmodel
	lmodel
	classifier_desc This is a thing that describes and generates classifiers.
	classifier This is the base class for all kinds of classifers.
	evaluate_match_w_rare This is called in the same way as evaluate_match or evaluate_Bayes.

Functions

str

Hash(dl)
Returns: a hash of the UID's of data items.

source code

str

Hash1(l)
Returns: a hash of a vector.

source code

prior(training)
This computes the probability of correct classification, assuming you can't see the feature vector.

source code

max_correct(training, testing)
This is a hard, conservative upper limit for the probability of correct classification.

source code

list(datum_tr)

read_data(fd, commentarray=None)
Reads in feature vectors where the first element is the true class.

source code

int

get_dim(fd)
This function takes a list of data (type datum_tr) and makes sure that they all have the same length feature vector.

source code

compute_cross_class(training, testing, modelchoice=None, n_per_dim=None, builder=None, classout=None, trainingset_name=None, modify_class=None, verbose=True)
Build classifiers based on the training set, and test them on the testing set.

source code

compute_self_class(d, coverage=None, ftest=None, modelchoice=None, n_per_dim=None, modify_class=None, builder=None, classout=None, verbose=True)
modelchoice here is expected to take one argument-- the data.

source code

list_groups(d, gr)

source code

compute_group_class(dg, modelchoice=None, n_per_dim=None, builder=None, classout=None, ftest=None, grouper=None, coverage=None, modify_class=None, verbose=True)
This function makes sure that the training set and testing set come from different groups.

source code

qzmodel(ndim)

source code

lzmodel(ndim)

source code

forest_build(data, N, modelchoice=None, trainingset_name=None)
Build a forest of classifiers.

source code

int

evaluate_match(cl, data)
This can be passed into a classifier descriptor as the evaluate argument.

source code

float

evaluate_Bayes(cl, data, constrain=0.0)
Evaluates the negative log of the probability that the classifier would assign to the datum being in the observed class (i.e.

source code

default_writer(summary, out, classout, wrong, fname='classes.chunk')
This writes out classifiers to a data file. source code

count_classes(data)
Count how many instances there are of each class.

source code

list_classes(data)
List the names of the classes in a dataset, with the most populus classes first.

source code

str

name_of_evaluator(e)
Used to get the name of an evaluator, to write it to a file header.

source code

evaluator_from_name(nm)
Maps a name to a function that will evaluate how well a classifier performs.

source code

default_modify_class(qc, training_counts, testing_counts)
Modifies a classifier so it isn't so dominated by the most frequent classes.

source code

Variables
	ERGCOVER = `4.0`
	D = `False`
	CONSTRAIN = `1e-06`
	__package__ = `'classifiers'`

Imports: re, math, zlib, numpy, chunkio, DS, die, mcmc, mcmc_helper, g_implements, fiatio, dictops, gpkmisc, DV, gpkavg

Function Details

Module q_classifier_r

Hash(dl)

Hash1(l)

prior(training)

read_data(fd, commentarray=None)

get_dim(fd)

compute_cross_class(training, testing, modelchoice=None, n_per_dim=None, builder=None, classout=None, trainingset_name=None, modify_class=None, verbose=True)

compute_group_class(dg, modelchoice=None, n_per_dim=None, builder=None, classout=None, ftest=None, grouper=None, coverage=None, modify_class=None, verbose=True)

forest_build(data, N, modelchoice=None, trainingset_name=None)

evaluate_match(cl, data)

evaluate_Bayes(cl, data, constrain=0.0)

default_writer(summary, out, classout, wrong, fname='classes.chunk')

count_classes(data)

list_classes(data)

name_of_evaluator(e)

evaluator_from_name(nm)

default_modify_class(qc, training_counts, testing_counts)

default_writer(summary, out, classout, wrong, fname=`'classes.chunk'`)