The Theory¶

Ranking can be biased¶

Core concepts: ranking bias¶

Search engines today are used to rank many different types of items, including items that represent people. Job recruiting search engines, marketplaces for consulting and other services, dating apps, etc. have at its core the idea of ranking/ordering people from most relevant to less relevant, which often means from “best” to “worst”.

Traditional scoring methods such as TF-IDF or BM25 (the two most popular ones) can introduce a certain degree of bias; the main motivation of this plugin is to provider methods to search without having this problem.

A computer system is biased [Friedman 1996] if:

It systematically and unfairly discriminate[s] against certain individuals or groups of individuals in favor of others. A system discriminates unfairly if it denies an opportunity or a good or if it assigns an undesirable outcome to an individual or a group of individuals on grounds that are unreasonable or inappropriate.

In algorithmic bias, an important concept is that of a protected group, which is a category of individuals protected by law, voluntary commitments, or other reasons. Search results are considered unfair if members of a protected group are systematically ranked lower than those of a non-protected group.

Examples where a fair search would be required are for instance, the US Equal Employment Opportunity Commission, which sets a goal of 12% of workers with disabilities in federal agencies in the US, while in Spain, a minimum of 40% of political candidates in voting districts exceeding a certain size must be women.

Real-world example: Job Search¶

Consider the three rankings in the table below, corresponding to searches for an “economist,” “market research analyst,” and “copywriter” in a job search engine, i.e., an online platform for jobs that is used by recruiters and headhunters to find suitable candidates.

Positions 1, 2, …, 10 are the top-10 ranking positions. A letter m indicates the candidate is a man, while f indicates the candidate is a woman.

Query	1	2	3	4	5	6	7	8	9	R10	Men:Women (top-10)	Men:Women (top-40)
Econ.	f	m	m	m	m	m	m	m	m	m	90%:10%	73%:27%
Analyst	f	m	f	f	f	f	f	m	f	f	20%:80%	43%:57%
Copywr.	m	m	m	m	m	m	f	m	m	m	90%:10%	73%:27%

While analyzing the extent to which candidates of both genders are represented as we go down these lists, we can observe that the proportions keep changing (compare the top-10 against the the top-40).

As a consequence, recruiters examining these lists will see different proportions depending on the point at which they decide to stop. This can cause under represented groups have not a fair outcome, so limiting the visibility.

The FA*IR algorithm¶

What is the fairness criterion applied by FA*IR?¶

A prefix of a list are the first elements of the list; for instance, the list (A, B, C) has prefixes (A), (A, B), and (A, B, C).

The fairness criterion in FA*IR [Zehlike et al. 2017] requires that the number of protected elements in every prefix of the list corresponds to the number of protected elements we would expect if they where picked at random using Bernoulli trials (independent “coin tosses”) with success probability p.

This correspondence is not exact, and there is a parameter α corresponding to the accepted probability of a Type I error, which means rejecting a fair ranking in this test. A typical value of α could be 0.1, or 10%.

Given p, α, and k, which is the total length of the list to be returned, an M-table is computed. This M-table indicates what is the minimum number of protected elements at every prefix.

Example¶

This example illustrates how the re-ranker works, but we will be omitting a correction on α that will be explained next.

Suppose p=0.5, this means that we would like a list in which the protected candidates are at least 50% of every prefix. Suppose α=0.1, meaning we accept a 10% of Type I error.

The M-table in this case is:

Position	1	2	3	4	5	6	7	8	9	10
Position	1	2	3	4	5	6	7	8	9	10
M	0	0	0	1	1	1	2	2	3	3

This means that, among the top 3 elements, even if there is no protected item, we would still consider the list to be fair, because if you toss a fair coin (p=0.5) 3 times, the chance of getting “heads” 3 times is above 10% (remember α=0.1). However, among the top 4 items, at least one of them has to be protected, because if you toss a fair coin, the chance of getting heads 4 times is below 10%, hence, with this α it is not believable that the coin was fair in the first place.

The rest of the M table is easy to interpret; for instance: among the top-5 elements there has to be at least 1 protected, among the top-7 there must be 2 at least, and among the top-9 there must be 3 at least.

Corrections for multiple hypotheses testing¶

The FA*IR plug-in does not use directly the parameter α, but computes a corrected α, which is in general smaller. For instance, for p=0.5, α=0.1, k=100, the corrected α=0.0207.

The corrected α accounts for the fact that, in a list of size k, there will be k tests performed, one for every prefix (for instance, 100). Hence, the probability of failing in at least one prefix is larger than α (because there are 100 attempts being made). The correction mechanism is explained in the FA*IR paper [Zehlike et al. 2017].

References¶

[Friedman 1996] Friedman, B., & Nissenbaum, H. (1996). Bias in computer systems. ACM Transactions on Information Systems (TOIS), 14(3), 330-347.

[Zehlike et al. 2017] Zehlike, M., Bonchi, F., Castillo, C., Hajian, S., Megahed, M., and Baeza-Yates, R. (2017, November). FA*IR: A fair top-k ranking algorithm. Proc. CIKM 2017 (pp. 1569-1578). ACM Press.