-
Notifications
You must be signed in to change notification settings - Fork 27.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Modular
transformers
: modularity and inheritance for new model addi…
…tions (#33248) * update exampel * update * push the converted diff files for testing and ci * correct one example * fix class attributes and docstring * nits * oups * fixed config! * update * nitd * class attributes are not matched against the other, this is missing * fixed overwriting self.xxx now onto the attributes I think * partial fix, now order with docstring * fix docstring order? * more fixes * update * fix missing docstrings! * examples don't all work yet * fixup * nit * updated * hick * update * delete * update * update * update * fix * all default * no local import * fix more diff * some fix related to "safe imports" * push fixed * add helper! * style * add a check * all by default * add the * update * FINALLY! * nit * fix config dependencies * man that is it * fix fix * update diffs * fix the last issue * re-default to all * alll the fixes * nice * fix properties vs setter * fixup * updates * update dependencies * make sure to install what needs to be installed * fixup * quick fix for now * fix! * fixup * update * update * updates * whitespaces * nit * fix * simplify everything, and make it file agnostic (should work for image processors) * style * finish fixing all import issues * fixup * empty modeling should not be written! * Add logic to find who depends on what * update * cleanup * update * update gemma to support positions * some small nits * this is the correct docstring for gemma2 * fix merging of docstrings * update * fixup * update * take doc into account * styling * update * fix hidden activation * more fixes * final fixes! * fixup * fixup instruct blip video * update * fix bugs * align gemma2 with the rest as well * updats * revert * update * more reversiom * grind * more * arf * update * order will matter * finish del stuff * update * rename to modular * fixup * nits * update makefile * fixup * update order of the checks! * fix * fix docstring that has a call inside * fiix conversion check * style * add some initial documentation * update * update doc * some fixup * updates * yups * Mostly todo gimme a minut * update * fixup * revert some stuff * Review docs for the modular transformers (#33472) Docs * good update * fixup * mmm current updates lead to this code * okay, this fixes it * cool * fixes * update * nit * updates * nits * fix doc * update * revert bad changes * update * updates * proper update * update * update? * up * update * cool * nits * nits * bon bon * fix * ? * minimise changes * update * update * update * updates? * fixed gemma2 * kind of a hack * nits * update * remove `diffs` in favor of `modular` * fix make fix copies --------- Co-authored-by: Lysandre Debut <[email protected]>
- Loading branch information
1 parent
75b7485
commit 317e069
Showing
41 changed files
with
6,504 additions
and
778 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
# Modular transformers | ||
|
||
`transformers` is an opinionated framework; our philosophy is defined in the following [conceptual guide](./philosophy). | ||
|
||
The core of that philosophy is exemplified by the [single model, single file](https://huggingface.co/blog/transformers-design-philosophy) | ||
aspect of the library. This component's downside is that it limits the inheritance and importability of components from | ||
files to others in the toolkit. | ||
|
||
As a result, model components tend to be repeated across many files. There are as many attention layers defined | ||
in `transformers` as there are models, and a significant number of those are identical to each other. | ||
The unfortunate consequence is that independent implementations tend to diverge as fixes and changes get applied | ||
to specific parts of the code. | ||
|
||
In order to balance this issue, we introduced the concept of "copies" across the library. By adding a comment indicating | ||
that code is a copy of another, we can enforce through CI and local commands that copies do not diverge. However, | ||
while the complexity is low, this is often quite tedious to do. | ||
|
||
And, finally, this contributes to adding a significant overhead to contributing models which we would like to remove. | ||
This approach often requires model contributions to add modeling code (~1k lines), processor (~500 lines), tests, docs, | ||
etc. Model contribution PRs rarely add less than 3-5k lines of code, with much of this code being boilerplate. | ||
|
||
This raises the bar for contributions, and with Modular Transformers, we're aiming to lower the bar to a much more | ||
acceptable point. | ||
|
||
## What is it? | ||
|
||
Modular Transformers introduces the concept of a "modular" file to a model folder. This modular file accepts code | ||
that isn't typically accepted in modeling/processing files, as it allows importing from neighbouring models as well | ||
as inheritance from classes to others. | ||
|
||
This modular file defines models, processors, and the configuration class that would otherwise be defined in their | ||
respective modules. | ||
|
||
Finally, this feature introduces a new `linter` which will "unravel" the modular file into the "single model, single | ||
file" directory structure. These files will get auto-generated every time the script is run; reducing the required | ||
contributions to the modular file, and therefore only to the changes between the contributed model and others. | ||
|
||
Model users will end up importing and using the single-file interface, so no change is expected here. Doing this, we | ||
hope to combine the best of both worlds: enabling simple contributions while sticking to our philosophy. | ||
|
||
This is therefore a replacement for the `# Copied from` markers, and previously contributed models can be expected to | ||
be moved to the new Modular Transformers format in the coming months. | ||
|
||
### Details | ||
|
||
The "linter", which unravels the inheritance and creates all single-files from the modular file, will flatten the | ||
inheritance while trying to be invisible to Python users. At this time, the linter flattens a **single** level of | ||
inheritance. | ||
|
||
For example: | ||
- If a configuration class inherits from another and adds/deletes an argument, the generated file will either directly | ||
reference it (in case of addition) or completely remove it (in case of deletion). | ||
- If a class inherits from another, for example: class GemmaModel(LlamaModel):, dependencies are automatically | ||
inferred. All submodules will be automatically inferred from the superclass. | ||
|
||
You should be able to write everything (the tokenizer, the image processor, the model, the config) in this `modular` | ||
file, and the corresponding files will be created for you. | ||
|
||
### Enforcement | ||
|
||
[TODO] We are introducing a new test, that makes sure the generated content matches what is present in the `modular_xxxx.py` | ||
|
||
### Examples | ||
|
||
Here is a quick example with BERT and RoBERTa. The two models are intimately related: their modeling implementation | ||
differs solely by a change in the embedding layer. | ||
|
||
Instead of redefining the model entirely, here is what the `modular_roberta.py` file looks like for the modeling & | ||
configuration classes (for the sake of the example, the tokenizer is ignored at this time as very different). | ||
|
||
```python | ||
from torch import nn | ||
from ..bert.configuration_bert import BertConfig | ||
from ..bert.modeling_bert import ( | ||
BertModel, | ||
BertEmbeddings, | ||
BertForMaskedLM | ||
) | ||
|
||
# The RoBERTa config is identical to BERT's config | ||
class RobertaConfig(BertConfig): | ||
model_type = 'roberta' | ||
|
||
# We redefine the embeddings here to highlight the padding ID difference, and we redefine the position embeddings | ||
class RobertaEmbeddings(BertEmbeddings): | ||
def __init__(self, config): | ||
super().__init__(config()) | ||
|
||
self.padding_idx = config.pad_token_id | ||
self.position_embeddings = nn.Embedding( | ||
config.max_position_embeddings, config.hidden_size, padding_idx=self.padding_idx | ||
) | ||
|
||
# The RoBERTa model is identical to the BERT model, except for the embedding layer. | ||
# We redefine the embeddings above, so here there is no need to do additional work | ||
class RobertaModel(BertModel): | ||
def __init__(self, config): | ||
super().__init__(config) | ||
self.embeddings = RobertaEmbeddings(config) | ||
|
||
|
||
# The heads now only need to redefine the model inside to the correct `RobertaModel` | ||
class RobertaForMaskedLM(BertForMaskedLM): | ||
def __init__(self, config): | ||
super().__init__(config) | ||
self.model = RobertaModel(config) | ||
``` | ||
|
||
Note that if you do not use the dependency that you defined, you will have the following error: | ||
|
||
```bash | ||
ValueError: You defined `RobertaEmbeddings` in the modular_roberta.py, it should be used | ||
when you define `BertModel`, as it is one of it's direct dependencies. Make sure | ||
you use it in the `__init__` function. | ||
``` | ||
Additionally, you may find a list of examples here: | ||
## What it is not | ||
It is not a replacement for the modeling code (yet?), and if your model is not based on anything else that ever existed, then you can add a `modeling` file as usual. |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Using the `modular_converter` linter | ||
|
||
`pip install libcst` is a must! | ||
|
||
# `sh examples/modular-transformers/convert_examples.sh` to get the converted outputs | ||
|
||
The modular converter is a new `linter` specific to `transformers`. It allows us to unpack inheritance in python to convert a modular file like `modular_gemma.py` into a `single model single file`. | ||
|
||
Examples of possible usage are available in the `examples/modular-transformers`, or `modular_gemma` for a full model usage. | ||
|
||
`python utils/modular_model_converter.py --files_to_parse "/Users/arthurzucker/Work/transformers/examples/modular-transformers/modular_my_new_model2.py"` | ||
|
||
## How it works | ||
We use the `libcst` parser to produce an AST representation of the `modular_xxx.py` file. For any imports that are made from `transformers.models.modeling_xxxx` we parse the source code of that module, and build a class dependency mapping, which allows us to unpack the modularerence dependencies. | ||
|
||
The code from the `modular` file and the class dependency mapping are "merged" to produce the single model single file. | ||
We use ruff to automatically remove the potential duplicate imports. | ||
|
||
## Why we use libcst instead of the native AST? | ||
AST is super powerful, but it does not keep the `docstring`, `comment` or code formatting. Thus we decided to go with `libcst` |
Empty file.
Oops, something went wrong.