K-means is a method of clustering which is an unsupervised learning problem.

In this method the number of clusters is an input to the algorithm (hyper-parameter).
K-means method is a greedy algorithm. It is a special case of expectation maximization (EM) algorithm, in which we try to find maximum likelihood expectation (MLE).
K-means algorithm is not guaranteed to converge to the global mean of the loss function (sum of Euclidean distances from each cluster center). The global minimum problem is an NP hard problem.

Relation to Gaussian Mixture Model (GMM):[edit | edit source]

GMM is a more probabilistic approach to clustering.
Expectation maximization (EM) algorithm is used to find a good gaussian mixture model to cluster the data.

Intuition[edit | edit source]

Data points in each cluster are closer to the center of each cluster than the center of other clusters.

Algorithm[edit | edit source]

Assume that the number of clusters is given by $k$ and the cluster centers are shown with $\mu _{1},\mu _{2},\cdots ,\mu _{k}\in \mathbb {R} ^{d}$ .

We define a loss function for clustering and try to minimize it through the following greedy algorithm.

The loss function is defined as

L=\sum _{j=1}^{k}\sum _{i}a_{ij}||x_{i}-\mu _{j}||^{2}~~~{\text{where}}~~~a_{ij}=\left\{{\begin{aligned}1&~~{\text{if }}x_{i}{\text{ assigned to }}j\\0&~~{\text{else}}\end{aligned}}\right.

Minimize $L$ with respect to $a$ and $\mu$ by following these two steps until convergence is achieved:

Choose optimal $a$ for fixed $\mu$ by assigning $x_{i}$ to the nearest $\mu _{j}$ $a_{ij}=\left\{{\begin{aligned}1&~~{\text{if }}j={\arg \min }_{l}|x_{i}-\mu _{l}|^{2}\\0&~~{\text{else}}\end{aligned}}\right.$
Choose optimal $\mu$ for fixed $a$ by updating $\mu _{j}$ to be the empirical mean of the points assigned to each cluster $\mu _{j}={\frac {1}{n_{j}}}\sum _{i:~x_{i}{\text{ in j}}}x_{i}{\text{ where }}n_{j}=\sum _{i=1}^{m}a_{ij}={\text{number of data points assigned to j}}$

Justification[edit | edit source]

In this section, we show that choosing cluster center, $\mu _{j}$ , according to step 2 of the algorithm minimized loss function for a fixed set of assignment factors ( $a_{ij}$ )

In order to find the minimum value of loss function ( $L$ ) as a function of $\mu _{j}$ , we find the point for which the gradient of the function is zero

\nabla _{\mu _{j}}L=0

Therefore, we have

{\begin{aligned}\nabla _{\mu _{j}}L=&\sum _{i}a_{ij}\nabla _{\mu _{j}}(x_{i}-\mu _{j})^{T}(x_{i}-\mu _{j})\\=&\sum _{i}a_{ij}\nabla _{\mu _{j}}(x_{i}^{T}x_{i}-2\mu _{j}^{T}x_{i}+\mu _{j}^{T}\mu _{j})\\=&\sum _{i}a_{ij}(-2x_{i}+2\mu _{j})=0\end{aligned}}

Getting rid of the factor of 2 in the last expression we have

\mu _{j}\sum _{i}a_{ij}=\sum _{i}a_{ij}x_{i}~~~\Rightarrow ~~~\mu _{j}={\frac {\sum _{i}a_{ij}x_{i}}{\sum _{i}a_{ij}}}

We also have

n_{j}=\sum _{i=1}^{n}a_{ij}=\#\{i:x_{i}{\text{ is assigned to }}j\}

, which simplifies

\mu _{j}

to

\mu _{j}={\frac {1}{n_{j}}}\sum _{i:~x_{i}{\text{ in j}}}x_{i}

Machine learning/Unsupervised Learning/K-means Clustering

Contents

Relation to Gaussian Mixture Model (GMM):[edit | edit source]

Intuition[edit | edit source]

Algorithm[edit | edit source]

Justification[edit | edit source]

Navigation menu

Machine learning/Unsupervised Learning/K-means Clustering

Relation to Gaussian Mixture Model (GMM):[edit | edit source]

Intuition[edit | edit source]

Algorithm[edit | edit source]

Justification[edit | edit source]

Navigation menu

Search