-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Matryoshka loss with Cached losses #3059
Comments
Hello! Yes, I agree that this would be quite valuable. I use MatryoshkaLoss whenever possible, and I don't see myself often training with non-Cached MNRL anymore. So, the incompatibility is bothersome. Either way, I'm certainly open to PRs here.
|
I think I was able to reproduce the error, was it something like Edit: Confirmed, I did not see the |
That was indeed the error. I added the warning after I noticed that MatryoshkaLoss was not compatible with the Cached... losses. |
Hi how can i reproduce this issue? i'd like to try and see if i can fix it :) thanks in advance |
Here is a fix proposal @tomaarsen : #3065 Tell me what you think about it and if you like it I can finish it. I tested it with cachedMNR and it seems to work 👍 @Squishedmac just setup a training script with cachedMNR and matryoshkaloss and it should break |
I think you indeed found the cause with your PR - the backwards call doesn't include the embedding reduction. Hmm, I wonder if there's a nice solution that stays within the MatryoshkaLoss fully, but I'm not very confident right now.
|
I created a draft PR for another potential solution #3068 |
Solid work, experimenting with that PR now to see what the performance is like @Marcel256 |
I want to use Matryoshka Loss along with Cached Losses like
CachedMultipleNegativesRankingLoss
andCachedGISTEmbedLoss
. However, the current implementation doesn't allow for this combination. I’ve reviewed the existing code and I’m willing to also contribute this feature. Are there any plans already on how this feature could be implemented?One simple idea is to create subclasses of the existing losses and add the Matryoshka logic within the calculate_loss functions. This way, we can keep the current losses unchanged, though it would lead to some code duplication. Other approaches would require more refactoring.
If anyone has ideas or suggestions, I'd love to hear them!
The text was updated successfully, but these errors were encountered: