Classifier architecture and Features.1

Greg Kochanski

February 14, 2004

To talk about ``close'' and ``far'' in a consistent manner, we need to have the concept of a distance metric. A distance metric, $d(A,B)$ is a function or algorithm for calculating a distance between two things, $A$ and $B$. It has three properties:

  1. It is always positive or zero.
  2. The distance from a thing2 to itself is zero.
  3. It obeys the triangle inequality: For any three points, $A$, $B$, and $C$, $d(A,B) + d(B,C) \ge d(A,C)$ for any possible choice of $B$. In other words, the straight line between $A$ and $C$, which has a length $d(A,C)$, is shorter3 than any other path between $A$ and $C$, such as a path that goes by way of $B$.
Anything that obeys these three properties is a distance metric. Common examples are Euclidean distance in two dimensions:
d_{Euclid}(A,B) = {\left ( (A_x - B_x)^2 + (A_y - B_y)^2 \right ) }^{1/2},
\end{displaymath} (1)

and city block metric [#!cityblock!#]:
d_{Euclid}(A,B) = \left ( \vert A_x - B_x\vert + \vert A_y - B_y\vert \right ).
\end{displaymath} (2)

(The Euclidean and city-block metrics generalize to any number of dimensions in a straightforward manner.)

Greg Kochanski 2004-02-14