[Proposal] Documentation: Map the Act Names to the Transformer #644
Labels
complexity-moderate
Moderately complicated issues for people who have intermediate experience with the code
documentation
Improvements or additions to documentation
Proposal
Create a figure that maps the act names to the transformer architecture.
Motivation
Names are just conventions. I find it hard to get the exact position within the transformer block just from the act name. I.e. the resid_pre might be before the split happens or before the merge happens. So I put it in context to the other act names and work by exclusion process or modify it to see what values will change.
Pitch
I suggest using the images from the Vasvani paper and adding labeled arrows pointing to the hook positions.
Alternatives
A list or table of (act name, description) pairs.
Checklist
The text was updated successfully, but these errors were encountered: