Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] RMSNormPre in Transformer_lens is maybe different from Llama source code? #657

Open
wangyifei0047 opened this issue Jul 6, 2024 · 1 comment
Labels
complexity-moderate Moderately complicated issues for people who have intermediate experience with the code needs-investigation Issues that need to be recreated, or investigated before work can be done

Comments

@wangyifei0047
Copy link

In LlamaModeling.py, the LlamaRMSNorm function outputs the weights * scaled hidden_states like below
image

RMSNormPre definition in Transformer_lens: it seems that this function just outputs the scaled hidden_states

image

The way RMSNormPre by which Transformer_Block uses
it seems that in the forward process in Transformer_Block, the weights of LlamaRMSNorm still not be added.

image

I want to hook the values after applying RMSNorm on each residual stream, so I try to find the parameters in RMSNorm and find something weird.

  • [yes] I have checked that there is no similar issue in the repo (required)
@4gatepylon
Copy link

Have you tried comparing intermediate values using hooks? It may be the case that they folded into the weights of a subsequent layer.

@bryce13950 bryce13950 added complexity-moderate Moderately complicated issues for people who have intermediate experience with the code needs-investigation Issues that need to be recreated, or investigated before work can be done labels Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complexity-moderate Moderately complicated issues for people who have intermediate experience with the code needs-investigation Issues that need to be recreated, or investigated before work can be done
Projects
None yet
Development

No branches or pull requests

3 participants