k-Nearest Neighbors Algorithm
In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for
short) is a non-parametric Machine Learning method used for classification and regression. In both cases, the input consists of the k closest
training examples in the feature
space. The output depends
on whether k-NN is used for classification or regression:
·
In k-NN
classification, the output is a class membership. An object is classified
by a majority vote of its neighbors, with the object being assigned to the
class most common among its k nearest neighbors (k is
a positive integer, typically small). If k = 1, then the object
is simply assigned to the class of that single nearest neighbor.
·
In k-NN
regression, the output is the property value for the object. This value is
the average of the values of its k nearest neighbors.
Given N training Vectors, the k-Nearest neighbor algorithm
identifies the k nearest neighbors of new element ‘c’ regardless of labels.
When k=1, each training vector defines a region in space,
defining a voronoi partition of the space.
Remarks:
1.
Choose an odd value of
k for a 2 class problem.
2.
K must not be multiple
of number of classes.
k-NN is a type of instance-based learning, or lazy
learning, where the function
is only approximated locally and all computation is deferred until
classification. The k-NN algorithm is among the simplest of
all machine learning algorithms.
Both for classification and regression, it can be useful to
weight the contributions of the neighbors, so that the nearer neighbors
contribute more to the average than the more distant ones. For example, a
common weighting scheme consists in giving each neighbor a weight of 1/d,
where d is the distance to the neighbor.
The neighbors are taken from a set of objects for which the
class (for k-NN classification) or the object property value
(for k-NN regression) is known. This can be thought of as the
training set for the algorithm, though no explicit training step is required.
Disadvantage of k Nearest Neighbor Algorithm
1.
A shortcoming of
the k-NN algorithm is that it is sensitive to the local structure
of the data.
2.
The main drawback of
k-NN is the complexity in searching the nearest neighbors for each sample.
No comments:
Post a Comment