Inda’s semantic search requires a domain-specific semantic model: this model specializes in vocabulary, idioms and semantic meanings typical of the recruiting field.
As explained in our previous article, the key to choose the more appropriate semantic model and to optimize its parameters is the evaluation method.
First and foremost, to define a model’s evaluation protocol, we need to mathematically define a metric that assigns a score to the model, based on how it performs a given task (the so-called downstream task).
In particular, Inda’s semantic model is required to sort words (skills, job titles, etc.) or texts (resumes, job descriptions, etc.) based on their semantic similarity to given concepts, words, or texts.
Hence, the key for the evaluation is comparing the rankings produced by the model with the correct ranking; in a subsequent article, we will provide more details on how to obtain the correct ranking – the so-called ground truth – or, more precisely, its best approximation.
The evaluation of the distance between rankings is a relevant problem in statistics. Indeed, we can rely on many mathematical definitions (the interested reader can refer, for instance, to the Wikipedia page on rank correlation for an overview). In particular, two coefficients that are quite popular in the scientific literature are Spearman’s rho and Kendall’s tau: both these metrics take as input two rankings and provide a correlation index between -1 and 1, depending on how similar the two rankings are. In particular:
In our case, the goal is to compare the ranking produced by the semantic model and the correct ranking: the metric must determine how much the first is similar to the latter, and the discrepancies among the two rankings correspond to the model’s errors.
Spearman’s rho and Kendall’s tau coefficient – although very useful from a theoretical and general perspective – don’t take into account a fundamental aspect in semantic model’ evaluation: the positions in the ranking have different relevance, and, thus, the severity of an error depends on the involved position.
This perspective has been observed and described in many scientific papers, concerning many different Natural Language Processing tasks, but it can be intuitively undestood with a simple example: let us suppose that we need to sort 100 words based oh their relevance to a given keyword; an error that consists in the exchange of the first and the fourth ranks is clearly more critical than a mistake that involves the switch of ranks 81st and 84th.
For this reason, semantic models’ evaluations within Inda Data Science team rely on a weighted version of Spearman’s rho and Kendall’s tau coefficients, so that each error contributes to the correlation coefficient with a weight that depends on the involved positions.
The recipe to determine the error’s weight based on the involved positions is non-trivial and far from unique. However, in a research article that we will present at the Empirical Methods in Natural Language Processing conference – the paper can be downloaded at the end of the page – we propose some general arguments that provide guidelines to choose “good weights”.
For all technical details, plead refer to our scientific paper Top-Rank-Focused Adaptive Vote Collection for the Evaluation of Domain-Specific Semantic Models, which you can download via the form below.