K Nearest Neighbor

Introduction

  • K Nearest Neighbor(KNN) is a kind of nonparametric method to classify data points. It’s very easy and convenient to implement KNN.

Details

  • Prerequests: N samples/instances:

$$
\mathcal{D}^{Train} = {(\textbf{x}_1, y_1), (\textbf{x}_2, y_2), (\textbf{x}_3, y_3), (\textbf{x}_4, y_4), \cdots, (\textbf{x}_n, y_n)}
$$

we let
$$
nn_k(\textbf{x}) = k\text{th} \text{ nearest neighbor of }\textbf{x}
$$

$$
knn(\textbf{x}) = \text{top } k\text{ nearest neighbor of \textbf{x}}
$$

Then the rule for classifying x into specific label is:
$$
y = f(\textbf{x}) = \text{argmax}_{c \in [C]} \sum\limits \mathbb{I}(x_n \in knn(\textbf{x}), y_n = c)
$$

Explanation

  • Let K nearest neighbors vote to represent current tuple’s label. It’s the intuition of the thought of K nearest neighbors
  • If the votes from K nearest neighbors are the same for top two categories, just choose the label from closer neighbor