Kemeny-Young Optimal Rank Aggregation in Python

Rank aggregation is a problem with many important applications and naive approaches to it go wrong in subtle ways.

Let's say that your national Quidditch league is dominated by five major wizard sports newspapers. Yes, the ones with moving images and everything. Every week after the games, each of them publishes a ranking of the star players. For now, let's suppose that the set of players under investigation is always the same, as the problem becomes a bit more complicated otherwise.

The Athletic Wizard: Alicia Spinnet, Ginny Weasley, Gwendolyn Morgan, Robin Higgy, Debbie Muntz
The Daily Prophet: Alicia, Ginny, Robin, Gwendolyn, Debbie
Quidditch News: Robin, Ginny, Gwendolyn, Debbie, Alicia
Seeker Weekly: Gwendolyn, Ginny, Robin, Debbie, Alicia
The Quibbler: Debbie, Ginny, Robin, Gwendolyn, Alicia

As you can see, there's quite a bit of disagreement and personal taste involved. You didn't get to watch all of the games, but you'd like to make a decision on who the best players were, by somehow aggregating the opinions of the popular newspapers. An easy option would be to pretend that each newspaper votes for the player that they rank #1, and ignore the rest. As Alicia Spinnet is the only player getting two nominations for best player, she should win best player, right?

Upon closer inspection, Alicia seems very controversial, loved by two but hated by five of the newspaper. Ginny, on the other hand, didn't stand out as best player to anybody, but she was uniformly considered runner-up. There should be some way to account for this. It would be nice if we would have a method of finding an optimal ranking that maximizes some sort of agreement with the opinions we are trying to aggregate.

Kendall's Tau distance

One of the most interesting ways to measure disagreement between rankings is the Tau statistic introduced by Kendall. It essentially measures the number of pairwise disagreements between two rankings. Since you can think of it as the number of flips you need to perform on a ranking to turn it into the other, it is sometimes called bubble-sort distance.

While the closely-related Tau correlation coefficient is implemented in Scipy as scipy.stats.kendalltau, let's code it ourselves in a simpler way.

In [1]:
from __future__ import print_function
import numpy as np
from itertools import combinations, permutations

def kendalltau_dist(rank_a, rank_b):
    tau = 0
    n_candidates = len(rank_a)
    for i, j in combinations(range(n_candidates), 2):
        tau += (np.sign(rank_a[i] - rank_a[j]) ==
                -np.sign(rank_b[i] - rank_b[j]))
    return tau

Now we can represent a rank as a numpy vector, with missing values.

In [2]:
# columns in order of appearance:
cols = "Alicia Ginny Gwendolyn Robin Debbie".split()
ranks = np.array([[0, 1, 2, 3, 4],
                  [0, 1, 3, 2, 4],
                  [4, 1, 2, 0, 3],
                  [4, 1, 0, 2, 3],
                  [4, 1, 3, 2, 0]])

Two rankings that agree completely should have a distance of 0.

In [3]:
kendalltau_dist(ranks[0], ranks[0])

The Athletic Wizard seems to be closer to the Daily Prophet than to Seeker Weekly.

In [4]:
print('d(athletic_wizard, prophet) = {}'.format(kendalltau_dist(ranks[0], ranks[1])))
print('d(athletic_wizard, seeker) = {}'.format(kendalltau_dist(ranks[0], ranks[3])))
d(athletic_wizard, prophet) = 1
d(athletic_wizard, seeker) = 5

Kemeny-Young rank aggregation

Now that we have a distance metric, we can formulate a loss function to minimize in rank-space. We are looking for a ranking $\hat\tau$ that satisfies $$ \sum_i d(\hat\tau, \tau_i) \leq \sum_i d(\tau, \tau_i) \text{ for all } \tau $$

This aggregation method was proposed by John Kemeny [1], and later shown by Peyton Young to be a maximum likelihood estimator of pairwise preferences under the assumption that a voter will randomly flip two candidates from the true ranking is the same $p < 0.5$ [2].

For rankings of small length, one way to compute this optimal aggregation is by comparing the scores of all possible rankings, in other words, a brute-force approach:

In [5]:
def rankaggr_brute(ranks):
    min_dist = np.inf
    best_rank = None
    n_voters, n_candidates = ranks.shape
    for candidate_rank in permutations(range(n_candidates)):
        dist = np.sum(kendalltau_dist(candidate_rank, rank) for rank in ranks)
        if dist < min_dist:
            min_dist = dist
            best_rank = candidate_rank
    return min_dist, best_rank
In [6]:
dist, aggr = rankaggr_brute(ranks)
print("A Kemeny-Young aggregation with score {} is: {}".format(
    ", ".join(cols[i] for i in np.argsort(aggr))))
A Kemeny-Young aggregation with score 15 is: Ginny, Robin, Gwendolyn, Debbie, Alicia

Integer Programming formulation

The brute-force approach is, as you can see, simple to understand and quick to code. However, the number of total ranks of size $n$ is $n!$, so this approach quickly becomes infeasible for real-world problems. Unfortunately, it turns out that this problem (along with many related problems in rank-world) is NP-hard even for only four voters [3]. There have been quite a few approaches and approximations. In many cases, such as search results aggregation, approximations are good enough. For the cases where an exact solution is required, a formulation as a constrained integer program is given in [4].

We build a weighted directed graph $G = (V, E)$ with the candidates as vertices. The edges are defined as such: for every pair of candidates $i, j$, let $\#\{i > j\}$ denote the number of voters who rank $i$ higher than $j$. Draw an edge $e$ between each $i, j$ with weight $w_e = |\#\{i > j\} - \#\{j > i\}|$ (if nonzero, of course). The orientation of the edge is from the less preferred to the more preferred node.

The formulation is based on the alternative interpretation of the Kemeny optimal aggregation as the ranking that minimizes the weights of edges it disagrees with:

$$ \operatorname{minimize} \sum_{e \in E} w_e x_e \ \operatorname{subject to} \ \forall i \neq j \in V, x_{ij} + x_{ji} = 1 \ \forall i \neq j \neq k \neq i \in V, x_{ij} + x_{jk} + x_{ki} \geq 1 $$

In the above problem, all variables are integer, and effectively binary under the other constraints. The interpretation is that $x_{ij} = 1$ if, in the aggregated rank, $i$ is ranked lower than $j$.

The constraints essentially impose that the variables define a total order. The first set of constraints enforce antisymmetry and totality: either $i$ is ranked lower than $j$ or the other way around. The second set of constraints enforce transitivity.

In [7]:
def _build_graph(ranks):
    n_voters, n_candidates = ranks.shape
    edge_weights = np.zeros((n_candidates, n_candidates))
    for i, j in combinations(range(n_candidates), 2):
        preference = ranks[:, i] - ranks[:, j]
        h_ij = np.sum(preference < 0)  # prefers i to j
        h_ji = np.sum(preference > 0)  # prefers j to i
        if h_ij > h_ji:
            edge_weights[i, j] = h_ij - h_ji
        elif h_ij < h_ji:
            edge_weights[j, i] = h_ji - h_ij
    return edge_weights

[[ 0.  0.  0.  0.  0.]
 [ 1.  0.  3.  3.  3.]
 [ 1.  0.  0.  0.  3.]
 [ 1.  0.  1.  0.  3.]
 [ 1.  0.  0.  0.  0.]]

Now, let's solve the linear program with the LGPL-licensed lpsolve. It conveniently comes with a Python interface.

In [8]:
from lp_solve import lp_solve

def rankaggr_lp(ranks):
    """Kemeny-Young optimal rank aggregation"""

    n_voters, n_candidates = ranks.shape
    # maximize c.T * x
    edge_weights = _build_graph(ranks)
    c = -1 * edge_weights.ravel()  

    idx = lambda i, j: n_candidates * i + j

    # constraints for every pair
    pairwise_constraints = np.zeros(((n_candidates * (n_candidates - 1)) / 2,
                                     n_candidates ** 2))
    for row, (i, j) in zip(pairwise_constraints,
                           combinations(range(n_candidates), 2)):
        row[[idx(i, j), idx(j, i)]] = 1

    # and for every cycle of length 3
    triangle_constraints = np.zeros(((n_candidates * (n_candidates - 1) *
                                     (n_candidates - 2)),
                                     n_candidates ** 2))
    for row, (i, j, k) in zip(triangle_constraints,
                              permutations(range(n_candidates), 3)):
        row[[idx(i, j), idx(j, k), idx(k, i)]] = 1

    constraints = np.vstack([pairwise_constraints, triangle_constraints])
    constraint_rhs = np.hstack([np.ones(len(pairwise_constraints)),
    constraint_signs = np.hstack([np.zeros(len(pairwise_constraints)),  # ==
                                  np.ones(len(triangle_constraints))])  # >=

    obj, x, duals = lp_solve(c, constraints, constraint_rhs, constraint_signs,
                             xint=range(1, 1 + n_candidates ** 2))

    x = np.array(x).reshape((n_candidates, n_candidates))
    aggr_rank = x.sum(axis=1)

    return obj, aggr_rank
In [9]:
_, aggr = rankaggr_lp(ranks)
score = np.sum(kendalltau_dist(aggr, rank) for rank in ranks)
print("A Kemeny-Young aggregation with score {} is: {}".format(
    ", ".join(cols[i] for i in np.argsort(aggr))))
A Kemeny-Young aggregation with score 15 is: Ginny, Robin, Gwendolyn, Debbie, Alicia

We get the same result as in the brute-force case. However, it's much faster. Let's verify this, and in the process, also test that we always get the same result.

In [10]:
from time import time
import pandas as pd

timings = pd.DataFrame(columns="method rank_len n_ranks time".split())
for rank_len in range(4, 9):
    for n_ranks in (5, 20):
        ranks = [np.random.permutation(rank_len) for _ in range(n_ranks)]
        ranks = np.array(ranks)
        t0 = time()
        brute_score, brute_aggr = rankaggr_brute(ranks)
        timings = timings.append(dict(method="brute",
                                      time=time() - t0),
        t0 = time()
        _, lp_aggr = rankaggr_lp(ranks)
        timings = timings.append(dict(method="lp",
                                      time=time() - t0),
        assert (brute_score ==
                np.sum(kendalltau_dist(lp_aggr, rank) for rank in ranks))

# lp is much faster, let's run it for longer rankings.
for rank_len in range(9, 16):
    for n_ranks in (5, 20):
        t0 = time()
        _, lp_aggr = rankaggr_lp(ranks)
        timings = timings.append(dict(method="lp",
                                      time=time() - t0),
In [11]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(7, 6))
plt.xticks(range(4, 16), range(4, 16))
for subplot, n_ranks in enumerate((5, 20)):
    for method in ('brute', 'lp'):
        this_rows = timings[(timings.n_ranks == n_ranks) &
                            (timings.method == method)]
        plt.semilogy(this_rows.rank_len, this_rows.time,
                     label="{} n={}".format(method, n_ranks))
plt.legend(loc="upper left")
plt.suptitle("Ranking size vs. log time to solve")

You can see that the brute force approach is a bit worse than exponential, while the integer programming approach is more reasonable and also is less sensitive to the number of voters being aggregated. We therefore have a reasonably fast efficient exact solution to the rank aggregation problem on small datasets.


[1] J Kemeny. Mathematics without numbers, 1959
[2] HP Young. Condorcet's theory of voting, 1988
[3] JJ Bartholdi III, CA Tovey, MA Trick. The computational difficulty of manipulating an election, 1989
[4] V Conitzer, A Davenport, J Kalagnanam. Improved bounds for computing Kemeny rankings, 2006

Comments !