### Introduction

- K Nearest Neighbor(
*KNN*) is a kind of nonparametric method to classify data points. It’s very easy and convenient to implement KNN.

### Details

- Prerequests: N samples/instances:

$$

\mathcal{D}^{Train} = {(\textbf{x}_1, y_1), (\textbf{x}_2, y_2), (\textbf{x}_3, y_3), (\textbf{x}_4, y_4), \cdots, (\textbf{x}_n, y_n)}

$$

we let

$$

nn_k(\textbf{x}) = k\text{th} \text{ nearest neighbor of }\textbf{x}

$$

$$

knn(\textbf{x}) = \text{top } k\text{ nearest neighbor of \textbf{x}}

$$

Then the rule for classifying **x** into specific label is:

$$

y = f(\textbf{x}) = \text{argmax}_{c \in [C]} \sum\limits \mathbb{I}(x_n \in knn(\textbf{x}), y_n = c)

$$

### Explanation

- Let K nearest neighbors vote to represent current tuple’s label. It’s the intuition of the thought of K nearest neighbors
- If the votes from K nearest neighbors are the same for top two categories, just choose the label from closer neighbor