Word mover’s distance classification in Python¶
A guide to scikit-learn compatible nearest neighbors classification using the recently introduced word mover’s distance (WMD). Joint post with the awesome Matt Kusner!
In document classification and other natural language processing applications, having a good measure of the similarity of two texts can be a valuable building block. Ideally, such a measure would capture semantic information. Cosine similarity on bag-of-words vectors is known to do well in practice, but it inherently cannot capture when documents say the same thing in completely different words.