[![Dictionary learned with K-Means on the LFW dataset with whitening PCA][]][][![Dictionary learned with K-Means on the LFW dataset without whitening PCA][]][]

One of the simplest, and yet most heavily constrained form of matrix factorization, is vector quantization (VQ). Heavily used in image/video compression, the VQ problem is a factorization [latex] X=WH[/latex] where [latex] H[/latex] (our dictionary) is called the codebook and is designed to cover the cloud of data points effectively, and each line of [latex] W[/latex] is a unit vector.

This means that each each data point [latex] x_i[/latex] is approximated as [latex] x_i \approx h_{k} = \sum_{j=1}\^{r} \delta_{kj}h_{j}[/latex]. In other words, the closest row vector (codeword/dictionary atom) [latex] h_k[/latex] of [latex] H[/latex] is chosen as an approximation, and this is encoded as a unit vector [latex] (\delta_{k1}, …, \delta_{kr})[/latex]. The data representation [latex] W[/latex] is composed of such vectors.

There is a variation called gain-shape VQ where instead of approximating
each point as one of the codewords, we allow a scalar multiplication
invariance: [latex] x_i \approx \alpha_ih_k[/latex]. This model
requires considerably more storage (each data point needs a floating
point number and an unsigned index, as opposed to just the index), but
it leads to a much better approximation.

Gain-shape VQ can equivalently be accomplished by normalizing each data
vector prior to fitting the codebook.

In order to fit a codebook [latex] H[/latex] for efficient VQ use, the K-Means Clustering [1] algorithm is a natural thought. K-means is an iterative algorithm that incrementally improves the dispersion of k cluster centers in the data space until convergence. The cluster centers are initialized in a random or procedural fashion, then, at each iteration, the data points are assigned to the closest cluster center, which is subsequently moved to the center of the points assigned to it.

The `scikits.learn.decomposition.KMeansCoder`

object from our
work-in-progress dictionary learning toolkit can learn a dictionary from
image patches using the K-Means algorithm, with optional local contrast
normalization and a PCA whitening transform. Using a trained object to
transform data points with orthogonal matching pursuit, with the
parameter `n_atoms=1`

is equivalent to gain-shape VQ. Of course you are
free to use any method of sparse coding such as LARS. The code used to
produce the example images on top of this post can be found in [2].

Using K-Means for learning the dictionary does not optimize over linear combinations of dictionary atoms, like standard dictionary learning methods do. However, it’s considerably faster, and Adam Coates and Andrew Ng suggest in [3] that as long as the dictionary is filled with a large enough number of atoms and it covers well enough the cloud of data (and of future test data) points, then K-Means, or even random sampling of image patches, can perform remarkably well for some tasks.

[![Dictionary learned with K-Means on the LFW dataset with whitening PCA][]]: http://localhost:8001/wp-content/uploads/2011/07/kmeans_w.png

[![Dictionary learned with K-Means on the LFW dataset without whitening PCA][]]: http://localhost:8001/wp-content/uploads/2011/07/kmeans_no_w.png

## Comments !