-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] How to load a model in smaller precision? #784
Comments
What error did you receive when you ran this in TransformerLens? |
I received the standard Cuda OutofMemory error For context I have an RTX 3060 Laptop. |
I am running this script
The only change I made is using hooked transformer over a standard transformer Here is a comparison of the outputs |
You should be able to load that model without issues. If you are getting memory errors, then my best guess is that there is something else running that is causing TransformerLens to not have enough memory. Looking at your code, I am curious, is there a reason you are creating your own tokenizer? That shouldn't account for everything, but it is definitely adding more overhead than needed. Also, make sure you are using the most recent version of TransformerLens. There was a pretty large memory issue fixed a few months back, so if you have an older version of TransformerLens, that could also be part of the issue. |
I had an external tokenizer just to showcase that I was only changing that one line. I was using 2.9.0 |
Question
Is doing
enough to load a model in bfloat16?
The model loads fine directly from huggingface but not through transformer lens.
The text was updated successfully, but these errors were encountered: