Overview
This note is mostly based on three books below. When necessary, I provide additional references in the last section.
- Li, Hang. “Learning to Rank for Information Retrieval and Natural Language Processing, Second Edition.” Learning to Rank for Information Retrieval and Natural Language Processing, Second Edition (2014).
- Liu, Tie-Yan. “Learning to rank for information retrieval.” Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval (2009): n. pag.
- [2010.06467] Pretrained Transformers for Text Ranking: BERT and Beyond (Lin et al.)
Rank Aggregation
Suppose there are M queries and N documents, there will be a ranking list for each of n queries. The goal is to aggregate these n ranking lists into one ranking list.
The simplest rank aggregation method is called Borda count. The Borda count algorithm operates on the ranking lists by the following steps:
- Step 1: Aligning ranking list of ranks by document indexes.
- Step 2: Using the total document number N to subtract each entry in the aligned ranking list.
- Step 3: Summing up the transformed ranking lists and generating a ranking based on this summed ranking list.
For example, the lists A, B, C, A, C, B and B, A, C:
- Step 1: After alignment by index
A,B, andC, the ranking lists of ranks become1, 2, 3,1, 3, 2, and2, 1, 3. - Step 2: Using N=3 to subtract each entry gives us
2, 1, 0,2, 0, 1, and1, 2, 0. - Step 3: The summed ranking list of ranks is
5, 3, 1. Therefore, the initial 3 ranking lists is converted to one single ranking list:A, B, C.
This could be easily implemented in Python as following:
from collections import defaultdict
def borda_count(votes):
N = len(votes[0])
score_dict = defaultdict(int)
for vote in votes:
for rank, candidate in enumerate(vote):
score_dict[candidate] += N - rank
aggregated_ranks = sorted(score_dict.keys(), key=score_dict.get, reverse=True)
return aggregated_ranks
votes = [["A", "B", "C"], ["A", "C", "B"], ["B", "A", "C"]]
print(borda_count(votes))