Add DebiasedMultipleNegativesRankingLoss
to the losses
#3148
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces the Debiased Contrastive Loss, from the paper "Debiased Contrastive Learning" (Chuang et al., NeurIPS 2020). The purpose of this loss is to reduce false negative bias, which occurs when negative samples in the dataset are semantically similar to the anchor. Such bias can harm the quality of embeddings and reduce performance in downstream tasks, as shown in the paper's results.
The integration follows the same structure as other losses in the
losses
package, with full documentation and acitation
method to reference the original work. This loss is an improved version ofMultipleNegativesRankingLoss
with an additional hyper-parametertau_plus
that controls the bias correction. Thus, it's compatible with methods likeGenQ
(see Query Generation Example).In this implementation, I focus on the case where$M = 1$ , meaning each anchor has one positive sample. This approach can be extended to handle multiple positive samples $M \geq 1$ , which could be a direction for future development. (Here, $M$ refers to the number of positive examples associated with each anchor)