Modular transformers: modularity and inheritance for new model addi…

…tions (#33248) * update exampel * update * push the converted diff files for testing and ci * correct one example * fix class attributes and docstring * nits * oups * fixed config! * update * nitd * class attributes are not matched against the other, this is missing * fixed overwriting self.xxx now onto the attributes I think * partial fix, now order with docstring * fix docstring order? * more fixes * update * fix missing docstrings! * examples don't all work yet * fixup * nit * updated * hick * update * delete * update * update * update * fix * all default * no local import * fix more diff * some fix related to "safe imports" * push fixed * add helper! * style * add a check * all by default * add the * update * FINALLY! * nit * fix config dependencies * man that is it * fix fix * update diffs * fix the last issue * re-default to all * alll the fixes * nice * fix properties vs setter * fixup * updates * update dependencies * make sure to install what needs to be installed * fixup * quick fix for now * fix! * fixup * update * update * updates * whitespaces * nit * fix * simplify everything, and make it file agnostic (should work for image processors) * style * finish fixing all import issues * fixup * empty modeling should not be written! * Add logic to find who depends on what * update * cleanup * update * update gemma to support positions * some small nits * this is the correct docstring for gemma2 * fix merging of docstrings * update * fixup * update * take doc into account * styling * update * fix hidden activation * more fixes * final fixes! * fixup * fixup instruct blip video * update * fix bugs * align gemma2 with the rest as well * updats * revert * update * more reversiom * grind * more * arf * update * order will matter * finish del stuff * update * rename to modular * fixup * nits * update makefile * fixup * update order of the checks! * fix * fix docstring that has a call inside * fiix conversion check * style * add some initial documentation * update * update doc * some fixup * updates * yups * Mostly todo gimme a minut * update * fixup * revert some stuff * Review docs for the modular transformers (#33472) Docs * good update * fixup * mmm current updates lead to this code * okay, this fixes it * cool * fixes * update * nit * updates * nits * fix doc * update * revert bad changes * update * updates * proper update * update * update? * up * update * cool * nits * nits * bon bon * fix * ? * minimise changes * update * update * update * updates? * fixed gemma2 * kind of a hack * nits * update * remove `diffs` in favor of `modular` * fix make fix copies --------- Co-authored-by: Lysandre Debut <[email protected]>
huggingface · Sep 24, 2024 · 317e069 · 317e069
1 parent 75b7485
commit 317e069
Show file tree

Hide file tree

Showing 41 changed files with 6,504 additions and 778 deletions.
diff --git a/.circleci/config.yml b/.circleci/config.yml
@@ -137,7 +137,7 @@ jobs:
         parallelism: 1
         steps:
             - checkout
-            - run: uv pip install -e .
+            - run: uv pip install -e ".[quality]"
             - run:
                 name: Show installed libraries and their versions
                 command: pip freeze | tee installed.txt
@@ -162,13 +162,14 @@ jobs:
         parallelism: 1
         steps:
             - checkout
-            - run: uv pip install -e .
+            - run: uv pip install -e ".[quality]"
             - run:
                 name: Show installed libraries and their versions
                 command: pip freeze | tee installed.txt
             - store_artifacts:
                   path: ~/transformers/installed.txt
             - run: python utils/check_copies.py
+            - run: python utils/check_modular_conversion.py
             - run: python utils/check_table.py
             - run: python utils/check_dummies.py
             - run: python utils/check_repo.py

diff --git a/Makefile b/Makefile
@@ -36,6 +36,7 @@ autogenerate_code: deps_table_update
 
 repo-consistency:
 	python utils/check_copies.py
+	python utils/check_modular_conversion.py
 	python utils/check_table.py
 	python utils/check_dummies.py
 	python utils/check_repo.py
@@ -80,6 +81,7 @@ fixup: modified_only_fixup extra_style_checks autogenerate_code repo-consistency
 
 fix-copies:
 	python utils/check_copies.py --fix_and_overwrite
+	python utils/check_modular_conversion.py  --fix_and_overwrite
 	python utils/check_table.py --fix_and_overwrite
 	python utils/check_dummies.py --fix_and_overwrite
 	python utils/check_doctest_list.py --fix_and_overwrite

diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
@@ -5,6 +5,8 @@
     title: Quick tour
   - local: installation
     title: Installation
+  - local: add_new_model
+    title: Adding a new model to `transformers`
   title: Get started
 - sections:
   - local: pipeline_tutorial
@@ -149,6 +151,8 @@
     title: Interoperability with GGUF files
   - local: tiktoken
     title: Interoperability with TikToken files
+  - local: modular_transformers
+    title: Modularity in `transformers`
   title: Developer guides
 - sections:
   - local: quantization/overview

diff --git a/docs/source/en/modular_transformers.md b/docs/source/en/modular_transformers.md
@@ -0,0 +1,121 @@
+# Modular transformers
+
+`transformers` is an opinionated framework; our philosophy is defined in the following [conceptual guide](./philosophy).
+
+The core of that philosophy is exemplified by the [single model, single file](https://huggingface.co/blog/transformers-design-philosophy)
+aspect of the library. This component's downside is that it limits the inheritance and importability of components from
+files to others in the toolkit.
+
+As a result, model components tend to be repeated across many files. There are as many attention layers defined
+in `transformers` as there are models, and a significant number of those are identical to each other. 
+The unfortunate consequence is that independent implementations tend to diverge as fixes and changes get applied
+to specific parts of the code.
+
+In order to balance this issue, we introduced the concept of "copies" across the library. By adding a comment indicating
+that code is a copy of another, we can enforce through CI and local commands that copies do not diverge. However,
+while the complexity is low, this is often quite tedious to do.
+
+And, finally, this contributes to adding a significant overhead to contributing models which we would like to remove.
+This approach often requires model contributions to add modeling code (~1k lines), processor (~500 lines), tests, docs,
+etc. Model contribution PRs rarely add less than 3-5k lines of code, with much of this code being boilerplate.
+
+This raises the bar for contributions, and with Modular Transformers, we're aiming to lower the bar to a much more
+acceptable point.
+
+## What is it?
+
+Modular Transformers introduces the concept of a "modular" file to a model folder. This modular file accepts code
+that isn't typically accepted in modeling/processing files, as it allows importing from neighbouring models as well
+as inheritance from classes to others.
+
+This modular file defines models, processors, and the configuration class that would otherwise be defined in their
+respective modules.
+
+Finally, this feature introduces a new `linter` which will "unravel" the modular file into the "single model, single 
+file" directory structure. These files will get auto-generated every time the script is run; reducing the required
+contributions to the modular file, and therefore only to the changes between the contributed model and others.
+
+Model users will end up importing and using the single-file interface, so no change is expected here. Doing this, we
+hope to combine the best of both worlds: enabling simple contributions while sticking to our philosophy.
+
+This is therefore a replacement for the `# Copied from` markers, and previously contributed models can be expected to
+be moved to the new Modular Transformers format in the coming months.
+
+### Details 
+
+The "linter", which unravels the inheritance and creates all single-files from the modular file, will flatten the 
+inheritance while trying to be invisible to Python users. At this time, the linter flattens a **single** level of
+inheritance.
+
+For example:
+- If a configuration class inherits from another and adds/deletes an argument, the generated file will either directly 
+  reference it (in case of addition) or completely remove it (in case of deletion).
+- If a class inherits from another, for example: class GemmaModel(LlamaModel):, dependencies are automatically 
+  inferred. All submodules will be automatically inferred from the superclass.
+
+You should be able to write everything (the tokenizer, the image processor, the model, the config) in this `modular` 
+file, and the corresponding files will be created for you. 
+
+### Enforcement
+
+[TODO] We are introducing a new test, that makes sure the generated content matches what is present in the `modular_xxxx.py`
+
+### Examples
+
+Here is a quick example with BERT and RoBERTa. The two models are intimately related: their modeling implementation 
+differs solely by a change in the embedding layer.
+
+Instead of redefining the model entirely, here is what the `modular_roberta.py` file looks like for the modeling &
+configuration classes (for the sake of the example, the tokenizer is ignored at this time as very different).
+
+```python
+from torch import nn
+from ..bert.configuration_bert import BertConfig
+from ..bert.modeling_bert import (
+    BertModel,
+    BertEmbeddings,
+    BertForMaskedLM
+)
+
+# The RoBERTa config is identical to BERT's config
+class RobertaConfig(BertConfig):
+  model_type = 'roberta'
+
+# We redefine the embeddings here to highlight the padding ID difference, and we redefine the position embeddings
+class RobertaEmbeddings(BertEmbeddings):
+    def __init__(self, config):
+        super().__init__(config())
+
+        self.padding_idx = config.pad_token_id
+        self.position_embeddings = nn.Embedding(
+            config.max_position_embeddings, config.hidden_size, padding_idx=self.padding_idx
+        )
+
+# The RoBERTa model is identical to the BERT model, except for the embedding layer. 
+# We redefine the embeddings above, so here there is no need to do additional work
+class RobertaModel(BertModel):
+  def __init__(self, config):
+    super().__init__(config)
+    self.embeddings = RobertaEmbeddings(config)
+
+
+# The heads now only need to redefine the model inside to the correct `RobertaModel`
+class RobertaForMaskedLM(BertForMaskedLM):
+  def __init__(self, config):
+    super().__init__(config)
+    self.model = RobertaModel(config)
+```
+
+Note that if you do not use the dependency that you defined, you will have the following error:
+
+```bash
+ValueError: You defined `RobertaEmbeddings` in the modular_roberta.py, it should be used
+                                    when you define `BertModel`, as it is one of it's direct dependencies. Make sure
+                                    you use it in the `__init__` function.
+```
+
+Additionally, you may find a list of examples here:
+
+## What it is not
+
+It is not a replacement for the modeling code (yet?), and if your model is not based on anything else that ever existed, then you can add a `modeling` file as usual.
diff --git a/examples/diff-conversion/README.md b/examples/diff-conversion/README.md
diff --git a/examples/modular-transformers/README.md b/examples/modular-transformers/README.md
@@ -0,0 +1,20 @@
+# Using the `modular_converter` linter
+
+`pip install libcst` is a must!
+
+# `sh examples/modular-transformers/convert_examples.sh` to get the converted outputs
+
+The modular converter is a new `linter` specific to `transformers`. It allows us to unpack inheritance in python to convert a modular file like `modular_gemma.py` into a `single model single file`. 
+
+Examples of possible usage are available in the `examples/modular-transformers`, or `modular_gemma` for a full model usage.
+
+`python utils/modular_model_converter.py --files_to_parse "/Users/arthurzucker/Work/transformers/examples/modular-transformers/modular_my_new_model2.py"`
+
+## How it works
+We use the `libcst` parser to produce an AST representation of the `modular_xxx.py` file. For any imports that are made from `transformers.models.modeling_xxxx` we parse the source code of that module, and build a class dependency mapping, which allows us to unpack the modularerence dependencies.
+
+The code from the `modular` file and the class dependency mapping are "merged" to produce the single model single file. 
+We use ruff to automatically remove the potential duplicate imports.
+
+## Why we use libcst instead of the native AST?
+AST is super powerful, but it does not keep the `docstring`, `comment` or code formatting. Thus we decided to go with `libcst`
diff --git a/examples/modular-transformers/configuration_dummy.py b/examples/modular-transformers/configuration_dummy.py