Rank aggregation is a problem with many important applications and naive approaches to it go wrong in subtle ways.

Let’s say that your national Quidditch league is dominated by five major wizard sports newspapers. Yes, the ones with moving images and everything. Every week after the games, each of them publishes a ranking of the star players. For now, let’s suppose that the set of players under investigation is always the same, as the problem becomes a bit more complicated otherwise.

**The Athletic Wizard**: Alicia Spinnet, Ginny Weasley, Gwendolyn Morgan, Robin Higgy, Debbie Muntz

**The Daily Prophet**: Alicia, Ginny, Robin, Gwendolyn, Debbie

**Quidditch News**: Robin, Ginny, Gwendolyn, Debbie, Alicia

**Seeker Weekly**: Gwendolyn, Ginny, Robin, Debbie, Alicia

**The Quibbler**: Debbie, Ginny, Robin, Gwendolyn, Alicia

As you can see, there’s quite a bit of disagreement and personal taste involved. You didn’t get to watch all of the games, but you’d like to make a decision on who the best players were, by somehow aggregating the opinions of the popular newspapers. An easy option would be to pretend that each newspaper *votes* for the player that they rank #1, and ignore the rest. As Alicia Spinnet is the only player getting *two* nominations for best player, she should win best player, right?

Upon closer inspection, Alicia seems very controversial, loved by two but hated by five of the newspaper. Ginny, on the other hand, didn’t stand out as best player to anybody, but she was *uniformly* considered runner-up. There should be some way to account for this. It would be nice if we would have a method of finding an *optimal* ranking that maximizes some sort of agreement with the opinions we are trying to aggregate.

## Kendall’s Tau distance

One of the most interesting ways to measure disagreement between rankings is the Tau statistic introduced by Kendall. It essentially measures the number of pairwise disagreements between two rankings. Since you can think of it as the number of flips you need to perform on a ranking to turn it into the other, it is sometimes called *bubble-sort distance*.

While the closely-related Tau correlation coefficient is implemented in Scipy as `scipy.stats.kendalltau`

, let’s code it ourselves in a simpler way.