-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diff converter v2 #30868
Merged
Merged
Diff converter v2 #30868
Changes from all commits
Commits
Show all changes
158 commits
Select commit
Hold shift + click to select a range
f02e2fb
current working example!
ArthurZucker 564813d
commit regex and result file
ArthurZucker bd59e58
update
ArthurZucker 0bb0af9
nit
ArthurZucker 1fa297c
push the conversion file
ArthurZucker eb5c2e2
oups
ArthurZucker e08d8eb
roadmap and nits
ArthurZucker 92b6218
attempt diffs for 3 files
ArthurZucker d68766a
persimmon
ArthurZucker 022727c
nit
ArthurZucker 740e5bd
Merge branch 'main' of github.com:huggingface/transformers into refac…
ArthurZucker 7545c5f
add diff file that is the same as the modeling_llama.py
ArthurZucker e467d2f
fix rope nits
ArthurZucker 1632e0f
updates
ArthurZucker 22ff159
updates with converted versions
ArthurZucker 1aabcc1
give some breathing space to the code
ArthurZucker 2a654ec
delete
ArthurZucker 8752d35
update
ArthurZucker ca181ab
update
ArthurZucker 3a3510a
push the actual result
ArthurZucker 0782ffd
update regex patterns
ArthurZucker 580fbe1
update regex patterns
ArthurZucker a47468a
fix some issues
ArthurZucker 774a4af
fix some issues
ArthurZucker 8fe406f
fix some issues
ArthurZucker d5c0004
updates
ArthurZucker d3ab98e
updates
ArthurZucker eaaf34f
updates
ArthurZucker 45f20f5
updates
ArthurZucker daebeea
updates
ArthurZucker 3dedb93
revert changes done to llama
ArthurZucker f3fe0b3
updates
ArthurZucker 35576ac
update gemma
ArthurZucker 709429a
updates
ArthurZucker cdb8c6b
oups
ArthurZucker ce615ff
current state
ArthurZucker 7b79b4d
current state
ArthurZucker c9fea75
update
ArthurZucker 8fe59a5
ouiiii
ArthurZucker fca954d
nit
ArthurZucker c44f827
clear diffs
ArthurZucker df9e783
nit
ArthurZucker c804b4b
fixup
ArthurZucker 6a5264d
update
ArthurZucker f5ebef0
doc 🚀
ArthurZucker 39ec61a
:fire:
ArthurZucker 24e072e
for now use gemma
ArthurZucker a5b8780
deal with comments
ArthurZucker 768801c
style
ArthurZucker 274ac88
handle funtions
ArthurZucker e606c51
deal with assigns
ArthurZucker 075be8c
todos
ArthurZucker 67471e6
process inheritage
ArthurZucker 39f696e
keep decorators?
ArthurZucker e3be54c
🤗
ArthurZucker 65a00ce
deal with duplicates
ArthurZucker 292e573
fixup
ArthurZucker 6c09d23
correctly remove duplicate code
ArthurZucker 52b70fd
run ruff post script
ArthurZucker 4aec181
ruff deals pretty well with imports, let's leave it to him
ArthurZucker c45466e
ah maybe not lol
ArthurZucker f8587d7
for now remove all imports from child.
ArthurZucker 07a90cc
nit
ArthurZucker b036a2a
conversion of llama
ArthurZucker 0ced2bc
okay
ArthurZucker 4e8a23e
convert starcoder2
ArthurZucker 38286ad
Merge branch 'main' of github.com:huggingface/transformers into diff-…
ArthurZucker 9dbb22a
synch with main
ArthurZucker d5b10f7
update llama diff
ArthurZucker 29e3381
updates
ArthurZucker 262c06b
https://docs.astral.sh/ruff/rules/redefined-while-unused/ fixes the i…
ArthurZucker fdc48d8
updates
ArthurZucker 43d7809
okay actual state
ArthurZucker c8e64ed
non zero exit
ArthurZucker 6147d3a
update!
ArthurZucker 53a4ce8
revert unrelated
ArthurZucker 0c7e43e
remove other diff files
ArthurZucker 10b5591
updates
ArthurZucker adc3f92
cleanup
ArthurZucker 3abd9f5
update
ArthurZucker 380b87f
less diff!
ArthurZucker 2df4ec6
stash
ArthurZucker 337321e
current updates
ArthurZucker 585686e
updates
ArthurZucker 91f45f8
No need for call
ArthurZucker 6fb42c2
finished fining deps
ArthurZucker b0853cb
update
ArthurZucker e62a5bb
current changes
ArthurZucker 40c5e6d
current state
ArthurZucker 49656b3
current state
ArthurZucker 8256a73
new status
ArthurZucker 4ead65b
nit
ArthurZucker b888fcd
finally
ArthurZucker 80363e3
fixes
ArthurZucker 7898d32
nits
ArthurZucker 793f638
order is now expected
ArthurZucker d6ef9e8
use logger info instead of prints
ArthurZucker 1ce5c1b
fixup
ArthurZucker 0990414
up
ArthurZucker 54af887
nit
ArthurZucker 494e6ba
update
ArthurZucker 6c48657
nits
ArthurZucker df19157
Merge branch 'main' of github.com:huggingface/transformers into diff-…
ArthurZucker f0068b7
update
ArthurZucker 6c423ce
correct merge
ArthurZucker 9d62ba5
update
ArthurZucker d1bc03b
update
ArthurZucker 28b5596
update
ArthurZucker 43d8d71
add warning
ArthurZucker 85bccc4
update caution message
ArthurZucker f1e1dec
update
ArthurZucker 7ea9bcd
better merging strategy
ArthurZucker 0faa82d
copy class statements :wink
ArthurZucker 1836a75
fixups
ArthurZucker 1128029
nits
ArthurZucker 42f640f
update
ArthurZucker ab3d410
Apply suggestions from code review
ArthurZucker ac0dc69
nits
ArthurZucker 1fd611c
Merge branch 'diff-converter' of github.com:huggingface/transformers …
ArthurZucker 85d2a50
smaller header
ArthurZucker dcee16e
do cleanup some stuff
ArthurZucker 0f4e05f
even simpler header?
ArthurZucker 058b6fa
fixup
ArthurZucker e3e6cca
updates
ArthurZucker 331d8a4
ruff
ArthurZucker 9828ffc
update examples
ArthurZucker 5a1cccd
nit
ArthurZucker 98c0a91
TODO
ArthurZucker 6207b52
state
ArthurZucker 64422e5
OUUUUUUF
ArthurZucker 8a85473
current state
ArthurZucker 513b933
nits
ArthurZucker 751c4db
final state
ArthurZucker 16b6aed
add a readme
ArthurZucker 2e74992
fixup
ArthurZucker fa8a86c
Merge branch 'main' of github.com:huggingface/transformers into diff-…
ArthurZucker 065cd1a
remove diff llama
ArthurZucker e1b0262
fix
ArthurZucker d7355db
nit
ArthurZucker c27e85c
dummy noy funny
ArthurZucker fc3c9e7
ruff format tests src utils --check
ArthurZucker ecc0aaa
everless diffs
ArthurZucker 5797c42
less diffs and fix test
ArthurZucker 54764f5
fixes
ArthurZucker 0422b9c
naming nit?
ArthurZucker d014449
update converter and add supper example
ArthurZucker f124cf9
nits
ArthurZucker 07c2aa9
updated for function signatures
ArthurZucker 2b96630
update
ArthurZucker 151cd71
update
ArthurZucker 03ac95c
add converted dummies
ArthurZucker e782306
autoformat
ArthurZucker 1839193
single target assign fix
ArthurZucker d9e1bf4
fixup
ArthurZucker 3eb121c
fix some imports
ArthurZucker 63b1bc1
fixes
ArthurZucker f667a9a
don't push them
ArthurZucker 969cdbf
`# noqa: F841`
ArthurZucker File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Using the `diff_converter` linter | ||
|
||
`pip install libcst` is a must! | ||
|
||
# `sh examples/diff-conversion/convert_examples.sh` to get the converted outputs | ||
|
||
The diff converter is a new `linter` specific to `transformers`. It allows us to unpack inheritance in python to convert a modular `diff` file like `diff_gemma.py` into a `single model single file`. | ||
|
||
Examples of possible usage are available in the `examples/diff-conversion`, or `diff_gemma` for a full model usage. | ||
|
||
`python utils/diff_model_converter.py --files_to_parse "/Users/arthurzucker/Work/transformers/examples/diff-conversion/diff_my_new_model2.py"` | ||
|
||
## How it works | ||
We use the `libcst` parser to produce an AST representation of the `diff_xxx.py` file. For any imports that are made from `transformers.models.modeling_xxxx` we parse the source code of that module, and build a class dependency mapping, which allows us to unpack the difference dependencies. | ||
|
||
The code from the `diff` file and the class dependency mapping are "merged" to produce the single model single file. | ||
We use ruff to automatically remove the potential duplicate imports. | ||
|
||
## Why we use libcst instead of the native AST? | ||
AST is super powerful, but it does not keep the `docstring`, `comment` or code formatting. Thus we decided to go with `libcst` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
#!/bin/bash | ||
|
||
# Iterate over each file in the current directory | ||
for file in examples/diff-conversion/diff_*; do | ||
# Check if it's a regular file | ||
if [ -f "$file" ]; then | ||
# Call the Python script with the file name as an argument | ||
python utils/diff_model_converter.py --files_to_parse "$file" | ||
fi | ||
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
from math import log | ||
from typing import List, Optional, Tuple, Union | ||
|
||
import torch | ||
|
||
from transformers import Cache | ||
from transformers.modeling_outputs import CausalLMOutputWithPast | ||
from transformers.models.llama.modeling_llama import LlamaModel | ||
|
||
|
||
def _pre_process_input(input_ids): | ||
print(log(input_ids)) | ||
return input_ids | ||
|
||
|
||
# example where we need some deps and some functions | ||
class DummyModel(LlamaModel): | ||
def forward( | ||
self, | ||
input_ids: torch.LongTensor = None, | ||
attention_mask: Optional[torch.Tensor] = None, | ||
position_ids: Optional[torch.LongTensor] = None, | ||
past_key_values: Optional[Union[Cache, List[torch.FloatTensor]]] = None, | ||
inputs_embeds: Optional[torch.FloatTensor] = None, | ||
use_cache: Optional[bool] = None, | ||
output_attentions: Optional[bool] = None, | ||
output_hidden_states: Optional[bool] = None, | ||
return_dict: Optional[bool] = None, | ||
cache_position: Optional[torch.LongTensor] = None, | ||
) -> Union[Tuple, CausalLMOutputWithPast]: | ||
input_ids = _pre_process_input(input_ids) | ||
|
||
return super().forward( | ||
None, | ||
attention_mask, | ||
position_ids, | ||
past_key_values, | ||
inputs_embeds, | ||
use_cache, | ||
output_attentions, | ||
output_hidden_states, | ||
return_dict, | ||
cache_position, | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
from transformers.models.llama.configuration_llama import LlamaConfig | ||
|
||
|
||
# Example where we only want to only add a new config argument and new arg doc | ||
# here there is no `ARG` so we are gonna take parent doc | ||
class MyNewModelConfig(LlamaConfig): | ||
r""" | ||
mlp_bias (`bool`, *optional*, defaults to `False`) | ||
""" | ||
|
||
def __init__(self, mlp_bias=True, new_param=0, **super_kwargs): | ||
self.mlp_bias = mlp_bias | ||
self.new_param = new_param | ||
super().__init__(self, **super_kwargs) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
from transformers.models.gemma.modeling_gemma import GemmaForSequenceClassification | ||
from transformers.models.llama.configuration_llama import LlamaConfig | ||
|
||
|
||
# Example where we only want to only modify the docstring | ||
class MyNewModel2Config(LlamaConfig): | ||
r""" | ||
This is the configuration class to store the configuration of a [`GemmaModel`]. It is used to instantiate an Gemma | ||
model according to the specified arguments, defining the model architecture. Instantiating a configuration with the | ||
defaults will yield a similar configuration to that of the Gemma-7B. | ||
e.g. [google/gemma-7b](https://huggingface.co/google/gemma-7b) | ||
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the | ||
documentation from [`PretrainedConfig`] for more information. | ||
Args: | ||
vocab_size (`int`, *optional*, defaults to 256000): | ||
Vocabulary size of the Gemma model. Defines the number of different tokens that can be represented by the | ||
`inputs_ids` passed when calling [`GemmaModel`] | ||
```python | ||
>>> from transformers import GemmaModel, GemmaConfig | ||
>>> # Initializing a Gemma gemma-7b style configuration | ||
>>> configuration = GemmaConfig() | ||
>>> # Initializing a model from the gemma-7b style configuration | ||
>>> model = GemmaModel(configuration) | ||
>>> # Accessing the model configuration | ||
>>> configuration = model.config | ||
```""" | ||
|
||
|
||
# Example where alllllll the dependencies are fetched to just copy the entire class | ||
class MyNewModel2ForSequenceClassification(GemmaForSequenceClassification): | ||
pass |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# Example where we only want to overwrite the defaults of an init | ||
|
||
from transformers.models.gemma.configuration_gemma import GemmaConfig | ||
|
||
|
||
class NewModelConfig(GemmaConfig): | ||
def __init__( | ||
self, | ||
vocab_size=256030, | ||
hidden_size=64, | ||
intermediate_size=90, | ||
num_hidden_layers=28, | ||
num_attention_heads=16, | ||
num_key_value_heads=16, | ||
head_dim=256, | ||
hidden_act="gelu_pytorch_tanh", | ||
hidden_activation=None, | ||
max_position_embeddings=1500, | ||
initializer_range=0.02, | ||
rms_norm_eps=1e-6, | ||
use_cache=True, | ||
pad_token_id=0, | ||
eos_token_id=1, | ||
bos_token_id=2, | ||
tie_word_embeddings=True, | ||
rope_theta=10000.0, | ||
attention_bias=False, | ||
attention_dropout=0.0, | ||
): | ||
super().__init__(self) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
from typing import List, Optional, Tuple, Union | ||
|
||
import torch | ||
|
||
from transformers import Cache | ||
from transformers.modeling_outputs import CausalLMOutputWithPast | ||
from transformers.models.llama.modeling_llama import LlamaModel | ||
|
||
|
||
# example where we need some deps and some functions | ||
class SuperModel(LlamaModel): | ||
def forward( | ||
self, | ||
input_ids: torch.LongTensor = None, | ||
attention_mask: Optional[torch.Tensor] = None, | ||
position_ids: Optional[torch.LongTensor] = None, | ||
past_key_values: Optional[Union[Cache, List[torch.FloatTensor]]] = None, | ||
inputs_embeds: Optional[torch.FloatTensor] = None, | ||
use_cache: Optional[bool] = None, | ||
output_attentions: Optional[bool] = None, | ||
output_hidden_states: Optional[bool] = None, | ||
return_dict: Optional[bool] = None, | ||
cache_position: Optional[torch.LongTensor] = None, | ||
) -> Union[Tuple, CausalLMOutputWithPast]: | ||
out = super().forward( | ||
input_ids, | ||
attention_mask, | ||
position_ids, | ||
past_key_values, | ||
inputs_embeds, | ||
use_cache, | ||
output_attentions, | ||
output_hidden_states, | ||
return_dict, | ||
cache_position, | ||
) | ||
out.logits *= 2**4 | ||
return out |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it supposed to be a real path to a diff file or just
<path_to_diff_file.py>
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just
<path_to_diff_file.py>
but we can also add the real path