  1. Word Mover’s Distance in Python

    Word mover’s distance classification in Python

    A guide to scikit-learn compatible nearest neighbors classification using the recently introduced word mover’s distance (WMD). Joint post with the awesome Matt Kusner!

    Source of this Jupyter notebook.

    In document classification and other natural language processing applications, having a good measure of the similarity of two texts can be a valuable building block. Ideally, such a measure would capture semantic information. Cosine similarity on bag-of-words vectors is known to do well in practice, but it inherently cannot capture when documents say the same thing in completely different words.