[Bug Report] RMSNormPre in Transformer_lens is maybe different from Llama source code? #657
Labels
complexity-moderate
Moderately complicated issues for people who have intermediate experience with the code
needs-investigation
Issues that need to be recreated, or investigated before work can be done
In LlamaModeling.py, the LlamaRMSNorm function outputs the weights * scaled hidden_states like below
RMSNormPre definition in Transformer_lens: it seems that this function just outputs the scaled hidden_states
The way RMSNormPre by which Transformer_Block uses
it seems that in the forward process in Transformer_Block, the weights of LlamaRMSNorm still not be added.
I want to hook the values after applying RMSNorm on each residual stream, so I try to find the parameters in RMSNorm and find something weird.
The text was updated successfully, but these errors were encountered: