TEAMS: Add TensorFlow 2 Model Garden Conversion Script #25177
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi,
with this PR a pretrained TEAMS model with TensorFlow Models Garden can be converted to an ELECTRA compatible model.
The TEAMS model was proposed in the "Training ELECTRA Augmented with Multi-word Selection paper and accepted at ACL 2021:
The TEAMS implementation can be found in the TensorFlow Models Garden repository.
Unfortunately, the authors did not release any pretrained models.
However, I pretrained a TEAMS model on German Wikipedia and release all checkpoints on the Hugging Face Model Hub. Additionally, the conversion script to integrate pretrained TEAMS into Transformers is included in this PR.
Closes #16466.
Implementation Details
TEAMS use the same architecture as ELECTRA (just pretraining approach is different). ELECTRA in Transformers comes with two models: Generator and Discriminator.
In contrast to ELECTRA, the TEAMS generator use shared layers with discriminator:
More precisely, the sharing of layers can be seen in the reference implementation:
https://github.com/tensorflow/models/blob/master/official/projects/teams/teams_task.py#L48
This shows, that the generator uses the first n layers from discriminator first (which is usually half size of specified total layers).
Retrieving TensorFlow 2 Checkpoints
In order to test the conversion script, the original TensorFlow 2 checkpoints need to be downloaded from Model Hub:
Additionally, to test the model locally, we need to download tokenizer:
Converting TEAMS Generator
After retrieving the original checkpoints, the generator configuration must be downloaded:
After that, the conversion script can be run to convert TEAMS (generator part) into ELECTRA generator:
The generator can be tested with masked lm pipeline to predict next work:
The example German should predict the capital city of Finland, which is Helsinki:
Converting TEAMS Discriminator
After retrieving the original checkpoints, the generator configuration must be downloaded:
After that, the conversion script can be run to convert TEAMS (generator part) into ELECTRA generator:
I made experiments on downstream tasks (such as NER or text classification) and the results are superior than to compared BERT models (original BERT and Token Dropping BERT).
Made with 🥨and ❤️.