Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Matryoshka loss with Cached losses #3059

Closed
Marcel256 opened this issue Nov 15, 2024 · 8 comments · Fixed by #3068
Closed

Using Matryoshka loss with Cached losses #3059

Marcel256 opened this issue Nov 15, 2024 · 8 comments · Fixed by #3068
Labels
good second issue Looking for help; but may be hard to implement

Comments

@Marcel256
Copy link
Contributor

I want to use Matryoshka Loss along with Cached Losses like CachedMultipleNegativesRankingLoss and CachedGISTEmbedLoss. However, the current implementation doesn't allow for this combination. I’ve reviewed the existing code and I’m willing to also contribute this feature. Are there any plans already on how this feature could be implemented?

One simple idea is to create subclasses of the existing losses and add the Matryoshka logic within the calculate_loss functions. This way, we can keep the current losses unchanged, though it would lead to some code duplication. Other approaches would require more refactoring.

If anyone has ideas or suggestions, I'd love to hear them!

@tomaarsen tomaarsen added the good second issue Looking for help; but may be hard to implement label Nov 15, 2024
@tomaarsen
Copy link
Collaborator

Hello!

Yes, I agree that this would be quite valuable. I use MatryoshkaLoss whenever possible, and I don't see myself often training with non-Cached MNRL anymore. So, the incompatibility is bothersome.
Ideally, I'd like to keep all changes within the MatryoshkaLoss to keep things more modular and separate. I think it might be feasible - I just haven't looked into it too much. I know there's some shape issue, presumably because the MatryoshkaLoss does some magic to reduce the shapes. I suspect that it's possible to modify the behaviour of MatryoshkaLoss if we have a Cached loss, for example to always return the full embedding if the requires_grad=False, but this might not work for evaluation.

Either way, I'm certainly open to PRs here.

  • Tom Aarsen

@GTimothee
Copy link

GTimothee commented Nov 15, 2024

I think I was able to reproduce the error, was it something like RuntimeError: inconsistent tensor size, expected tensor [768] and src [128] to have the same number of elements, but got 768 and 128 elements respectively ?

Edit: Confirmed, I did not see the UserWarning: MatryoshkaLoss is not compatible with CachedMultipleNegativesRankingLoss. at first

@tomaarsen
Copy link
Collaborator

That was indeed the error. I added the warning after I noticed that MatryoshkaLoss was not compatible with the Cached... losses.

@Squishedmac
Copy link

Hi how can i reproduce this issue? i'd like to try and see if i can fix it :) thanks in advance

@GTimothee
Copy link

Here is a fix proposal @tomaarsen : #3065

Tell me what you think about it and if you like it I can finish it. I tested it with cachedMNR and it seems to work 👍

@Squishedmac just setup a training script with cachedMNR and matryoshkaloss and it should break

@tomaarsen
Copy link
Collaborator

I think you indeed found the cause with your PR - the backwards call doesn't include the embedding reduction. Hmm, I wonder if there's a nice solution that stays within the MatryoshkaLoss fully, but I'm not very confident right now.

  • Tom Aarsen

@Marcel256
Copy link
Contributor Author

I created a draft PR for another potential solution #3068
The main idea is to reuse the embeddings computed in the first step of the cached losses with a seperate decorator which handles the trucation of the embeddings.

@tomaarsen
Copy link
Collaborator

Solid work, experimenting with that PR now to see what the performance is like @Marcel256

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good second issue Looking for help; but may be hard to implement
Projects
None yet
4 participants