How to use the fair-search plugin¶
Now it is time to finally perform a fair re-scoring.
The usual flow for the fairsearch plug-in is this one:
- a user executes a query in the search engine, and during this process,
- indicates s/he wants to apply the fairsearch plug-in.
To achive this we are going to use a functionality provided by Elasticsearch named re-scoring.
Assumptions and preconditions for this example¶
Lets suppose we have already in our search engine this set of documents:
Doc1 { body: "hello hello hello hello hello hello hello hello hello hello", gender: "m" }
Doc3 { body: "hello hello hello hello hello hello hello hello hello bye", gender: "m" }
Doc5 { body: "hello hello hello hello hello hello hello hello bye bye", gender: "m" }
Doc7 { body: "hello hello hello hello hello hello hello bye bye bye", gender: "m" }
Doc9 { body: "hello hello hello hello hello hello bye bye bye bye", gender: "m" }
Doc2 { body: "hello hello hello hello hello bye bye bye bye bye", gender: "f" }
Doc4 { body: "hello hello hello hello bye bye bye bye bye bye", gender: "f" }
Doc6 { body: "hello hello hello bye bye bye bye bye bye bye", gender: "f" }
Doc8 { body: "hello hello bye bye bye bye bye bye bye bye", gender: "f" }
Doc10 { body: "hello bye bye bye bye bye bye bye bye bye", gender: "f" }
In this example, women will be our protected category. As we see in the “body” of the documents above, the word “hello” occurs more in the ones having gender=m
(male) than in the ones having gender=f
(female).
How does a search looks like¶
Lets first imagine we execute a normal search for “hello”, one without using the Fairsearch plugin. The results would look like this:
GET test/_search
{
"query": {
"match": {
"body": "hello"
}
}
}
This request will return all documents that match the word hello, sorted by their relevance scoring. For this particular dataset we would get this results:
Doc1, Doc3, Doc5, Doc7, Doc9, Doc2, Doc4, Doc6, Doc8, Doc10
that if we take a close look these will be:
m, m, m, m, m, f, f, f, f, f
with all men as first top results, however as we could see in the Motivation section, there are many situations where we might aim for a fair result. To achieve this we will use the plug-in.
A request with the rescore function will look like this:
GET test/_search
{
"query": {
"match": {
"body": "hello"
}
},
"rescore": {
"fair_rescorer": {
"protected_key": "gender",
"protected_value": "f",
"significance_level": 0.1,
"min_proportion_protected": 0.6
}
}
}
this request is actually doing an Elasticsearch match query, could it by any other type of query, for example a bool or a multi match. then after the results are calculated (in every shard) it apply the fair topK algorithm.
This request will give you a response where the target number of protected elements will be scored in relevant places, that for our example will be:
Doc1, Doc3, Doc2, Doc5, Doc4, Doc7, Doc9, Doc6, Doc8, Doc10
in terms of gender:
m, m, f, m, f, m, m, f, f, f
with a much fair distribution of elements of the protected class (i.e., some women appear in the top positions).