Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add diffllama #34083

Merged
merged 57 commits into from
Jan 7, 2025
Merged
Changes from 1 commit
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
3bd9e34
first adding diffllama
weak-kajuma Oct 11, 2024
269055e
add Diff Attention and other but still with errors
weak-kajuma Oct 11, 2024
dbbf073
complate make attention Diff-Attention
weak-kajuma Oct 16, 2024
c4ea9df
fix some bugs which may be caused by transformer-cli while adding model
weak-kajuma Oct 16, 2024
e072544
fix a bug caused by forgetting KV cache...
weak-kajuma Oct 16, 2024
674d7a2
Update src/transformers/models/diffllama/modeling_diffllama.py
weak-kajuma Oct 20, 2024
9eac636
Update src/transformers/models/diffllama/modeling_diffllama.py
weak-kajuma Oct 20, 2024
0e99dbd
Update src/transformers/models/diffllama/modeling_diffllama.py
weak-kajuma Oct 20, 2024
1e445c7
Update src/transformers/models/diffllama/modeling_diffllama.py
weak-kajuma Oct 20, 2024
cca6a5c
Update src/transformers/models/diffllama/modeling_diffllama.py
weak-kajuma Oct 20, 2024
dd167af
Update src/transformers/models/diffllama/modeling_diffllama.py
weak-kajuma Oct 20, 2024
23099cb
Update src/transformers/models/diffllama/modeling_diffllama.py
weak-kajuma Oct 20, 2024
faac378
Update src/transformers/models/diffllama/modeling_diffllama.py
weak-kajuma Oct 20, 2024
53e13aa
I found Attention missed implemented from paper still on e072544a3bfc…
weak-kajuma Oct 20, 2024
63b018a
re-implemented
weak-kajuma Oct 20, 2024
204bec8
adding groupnorm
weak-kajuma Oct 20, 2024
bce12e5
align with transformers code style
weak-kajuma Oct 20, 2024
44d8423
fix typo
weak-kajuma Oct 20, 2024
6dc6f81
adding groupnorm
weak-kajuma Oct 20, 2024
48b38e8
change SdpaAttention to DiffSdpaAttention
weak-kajuma Oct 20, 2024
997f561
fix bug
weak-kajuma Oct 20, 2024
107bd3c
Update src/transformers/models/diffllama/modeling_diffllama.py
weak-kajuma Oct 21, 2024
26307d9
fix bugs of places of "GroupNorm with scale" and etc
weak-kajuma Oct 21, 2024
22aa145
Revert "fix bugs of places of "GroupNorm with scale" and etc"
weak-kajuma Oct 21, 2024
cc472be
simplify multiple of attention (matmul) operations into one by repeat…
weak-kajuma Oct 22, 2024
e834129
simplify multiple of attention (matmul) operations into one by repeat…
weak-kajuma Oct 22, 2024
e9d94e5
simplify multiple of attention (matmul) operations into one by repeat…
weak-kajuma Oct 22, 2024
0352999
remove missed type
weak-kajuma Oct 22, 2024
843178a
add diffllama model_doc
weak-kajuma Oct 29, 2024
71c8d12
apply make style/quality
weak-kajuma Oct 29, 2024
fea95fa
apply review comment about model
weak-kajuma Oct 30, 2024
b3f8dd5
apply review comment about test
weak-kajuma Oct 30, 2024
50ce353
place diffllama alphabetically on the src/transformers/__init__.py
weak-kajuma Oct 30, 2024
6f25333
fix forgot code
weak-kajuma Oct 31, 2024
dd2282e
Supports parameters that are not initialized with standard deviation …
weak-kajuma Oct 31, 2024
9e7a9c3
add DiffLlamaConfig to CONFIG_CLASSES_TO_IGNORE_FOR_DOCSTRING_CHECKPO…
weak-kajuma Oct 31, 2024
8c98d19
remove unused property of config
weak-kajuma Nov 1, 2024
cbf217d
add to supported model list
weak-kajuma Nov 1, 2024
c873982
add to spda supported model list
weak-kajuma Nov 1, 2024
b003a53
fix copyright, remove pretraining_tensor_parallel, and modify for ini…
weak-kajuma Nov 7, 2024
37c7a88
remove unused import and etc.
weak-kajuma Nov 7, 2024
ba92d5c
empty commit
weak-kajuma Nov 7, 2024
8cc823e
empty commit
weak-kajuma Nov 7, 2024
d47631d
empty commit
weak-kajuma Nov 7, 2024
c6932de
apply modular transformers but with bugs
weak-kajuma Nov 20, 2024
48e16cf
revert prev commit
weak-kajuma Dec 1, 2024
a44f95d
create src/transformers/model/diffllama/modular_diffllama.py
weak-kajuma Dec 1, 2024
c45aa59
run utils/modular_model_converter.py
weak-kajuma Dec 1, 2024
c5741eb
empty commit
weak-kajuma Dec 1, 2024
ea622ce
leaner modular diffllama
weak-kajuma Dec 6, 2024
e30c298
Merge branch 'huggingface:main' into add_diffllama
weak-kajuma Dec 6, 2024
3f85c22
remove more and more in modular_diffllama.pt
weak-kajuma Dec 6, 2024
87d034d
remove more and more in modular_diffllama.pt
weak-kajuma Dec 6, 2024
4660c6e
resolve missing docstring entries
weak-kajuma Dec 21, 2024
b4ff5f3
force reset
weak-kajuma Dec 21, 2024
484a493
Merge branch 'huggingface:main' into add_diffllama
weak-kajuma Dec 21, 2024
0ce2023
convert modular
weak-kajuma Dec 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update src/transformers/models/diffllama/modeling_diffllama.py
fix 2times divide by sqrt(self.head_dim)

Co-authored-by: Minho Ryu <[email protected]>
weak-kajuma and bzantium authored Oct 20, 2024
commit dd167af8c0206c91985946e131c8a95fd6c48c1b
1 change: 0 additions & 1 deletion src/transformers/models/diffllama/modeling_diffllama.py
Original file line number Diff line number Diff line change
@@ -297,7 +297,6 @@ def __init__(self, config: DiffLlamaConfig, layer_idx: Optional[int] = None):
self.hidden_size = config.hidden_size
self.num_heads = config.num_attention_heads
self.head_dim = getattr(config, "head_dim", self.hidden_size // self.num_heads)
self.scaling = self.head_dim ** -0.5
self.num_key_value_heads = config.num_key_value_heads
self.num_key_value_groups = self.num_heads // self.num_key_value_heads
# under this are not used