Package gmisclib :: Module edit_distance :: Class text_cost
[frames] | no frames]

Class text_cost

source code


This is useful for differencing documents that have been parsed into lists of words. It computes the cost of a change of words by an edit distance computation on the characters within the words. (It caches the costs of commonly encountered words for speed.) Instances of this class can be passed as a cost function to distf.

Instance Methods
 
__init__(self, costfac=None, cachesize=None, word_cost=None, frac_exp=0.0)
Create an instance.
source code
presumably a float
__call__(self, a, b)
To be used by distf.
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Properties

Inherited from object: __class__

Method Details

__init__(self, costfac=None, cachesize=None, word_cost=None, frac_exp=0.0)
(Constructor)

source code 

Create an instance.

Parameters:
  • costfac (function(str,str): float or None) - None, or an arbitrary function that multiplies the scaled edit distance between the two words. Called as self.costfac(a,b) where either a or b can be None for insertions or deletions.
  • cachesize (None (meaning unlimited) or int) - how large a cache of word-to-word distances should be kept?
  • word_cost (function(str,str): float or None (meaning use def_cost).) - cost function to use inside words: this is a cost for insertion, deletion, or substitution of letters.
  • frac_exp (float) - An exponent for scaling the edit distance by the length of words.
Overrides: object.__init__

__call__(self, a, b)
(Call operator)

source code 

To be used by distf.

Parameters:
  • a (str or None)
  • b (str or None)
Returns: presumably a float
a cost