How to use the fair-search plugin

Now it is time to finally perform a fair re-scoring.

The usual flow for the fairsearch plug-in is this one:

  • a user executes a query in the search engine, and during this process,
  • indicates s/he wants to apply the fairsearch plug-in.

To achive this we are going to use a functionality provided by Elasticsearch named re-scoring.

Assumptions and preconditions for this example

Lets suppose we have already in our search engine this set of documents:

Doc1 { body: "hello hello hello hello hello hello hello hello hello hello", gender: "m" }
Doc3 { body: "hello hello hello hello hello hello hello hello hello bye", gender: "m" }
Doc5 { body: "hello hello hello hello hello hello hello hello bye bye", gender: "m" }
Doc7 { body: "hello hello hello hello hello hello hello bye bye bye", gender: "m" }
Doc9 { body: "hello hello hello hello hello hello bye bye bye bye", gender: "m" }
Doc2 { body: "hello hello hello hello hello bye bye bye bye bye", gender: "f" }
Doc4 { body: "hello hello hello hello bye bye bye bye bye bye", gender: "f" }
Doc6 { body: "hello hello hello bye bye bye bye bye bye bye", gender: "f" }
Doc8 { body: "hello hello bye bye bye bye bye bye bye bye", gender: "f" }
Doc10 { body: "hello bye bye bye bye bye bye bye bye bye", gender: "f" }

In this example, women will be our protected category. As we see in the “body” of the documents above, the word “hello” occurs more in the ones having gender=m (male) than in the ones having gender=f (female).

How does a search looks like

Lets first imagine we execute a normal search for “hello”, one without using the fairsearch plug-in. The results would look like this:

GET test/_search
{
    "query": {
        "match": {
            "body": "hello"
        }
    }
}

This request will return all documents that match the word hello, sorted by their relevance scoring. For this particular dataset we would get these results:

Doc1, Doc3, Doc5, Doc7, Doc9, Doc2, Doc4, Doc6, Doc8, Doc10

and if we take a closer look, these will be:

m, m, m, m, m, f, f, f, f, f

with all men as first top results, however as we could see in the The Theory section, there are many situations where we might aim for a fair result. To achieve this we will use the plug-in.

A request with the re-score function will look like this:

GET test/_search
{
    "query": {
        "match": {
            "body": "hello"
        }
    },
    "rescore": {
        "fair_rescorer": {
            "protected_key": "gender",
            "protected_value": "f",
            "significance_level": 0.1,
            "min_proportion_protected": 0.6
        }
    }
}

this request is actually doing an Elasticsearch match query (it also could do any other type of query, for example a bool or a multi match), then after the results are calculated (in every shard) it applies the fair topK algorithm.

This request will give you a response where the target number of protected elements will be scored in relevant places, that for our example will be:

Doc1, Doc3, Doc2, Doc5, Doc4, Doc7, Doc9, Doc6, Doc8, Doc10

or, in terms of gender:

m, m, f, m, f, m, m, f, f, f

with a much fairer distribution of elements of the protected class (i.e., some women appear in the top positions).

Details on parameters

As we saw in the The Theory, for the fair query to work, we would need an mtable against which the re-scoring will be executed. To make the use of the plug-in easier, the re-score function does that in the background. So, the previous re-score call will create an mtable with the following parameters:

  • k=10, as 10 is the default documents that are returned by Elasticsearch
  • α=0.1, as that is the significance level specified in the query
  • p=0.6, as that is the minimum proportion of protected elements specified in the top results

Elasticsearch size parameters

Elasticsearch supports the following size parameters:

  • size, which is the total number of documents returned by ElasticSearch (defaults to 10)
  • window_size, which is the number of documents to be re-ranked by the re-score function and returned to the user (defaults to 10).

For example, if you make a query like this:

GET test/_search
{
   "size": 50,
    "query": {
        "match": {
            "body": "hello"
        }
    },
    "rescore": {
        "window_size": 10,
        "fair_rescorer": {
            "protected_key": "gender",
            "protected_value": "f",
            "significance_level": 0.1,
            "min_proportion_protected": 0.6
        }
    }
}

This will try to find initial 50 documents that match the query and, then, re-rank and return only the top 10. So, the plug-in will choose window_size as the k parameter or, if smaller, the actual numbers of documents initially returned.

Ensuring there are protected elements in the topK results

A recommended way of using the plug-in is to specify a higher number for size, so that more elements can be included in the initial list for more fair re-ranking. Then, after the re-scoring phase, only window_size number of documents will be returned.