# Machine learning/Classification algorithms

Classification is a subcategory of supervised learning problems.

## k-nearest neighbor

• a simple classification algorithm
• Intuition: Find the majority vote in the training data
• This is a discriminative model, meaning that there is no way to generate the training data points

### Algorithm

• Define some distance metric or similarity metric. The simplest case is Euclidean distance.
• Given some input point ${\displaystyle x}$, find the ${\displaystyle k}$'th nearest neighbors from the training set.
• Do a majority vote between these nearest neighbor list and classify the input point as the category with highest number of vote.

### Probabilistic interpretation

Consider the classification output as a random variable ${\displaystyle y}$. Define probability of ${\displaystyle y}$ given input ${\displaystyle x}$ and training data ${\displaystyle D}$ is

${\displaystyle P(y|x,D)={\text{fraction of points }}x_{i}{\text{ in }}k{\text{-th nearest neighbor points to }}x{\text{ such that }}y_{i}=y}$
The output of the classification is

${\displaystyle {\hat {y}}=\arg \max _{y}P(y|x,D)}$