Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEAMS: Add TensorFlow 2 Model Garden Conversion Script #25177

Conversation

stefan-it
Copy link
Collaborator

@stefan-it stefan-it commented Jul 28, 2023

Hi,

with this PR a pretrained TEAMS model with TensorFlow Models Garden can be converted to an ELECTRA compatible model.

The TEAMS model was proposed in the "Training ELECTRA Augmented with Multi-word Selection paper and accepted at ACL 2021:

A new text encoder pre-training method is presented that improves ELECTRA based on multi-task learning and develops two techniques to effectively combine all pre- training tasks: using attention-based networks for task-specific heads, and sharing bottom layers of the generator and the discriminator.

The TEAMS implementation can be found in the TensorFlow Models Garden repository.

Unfortunately, the authors did not release any pretrained models.

However, I pretrained a TEAMS model on German Wikipedia and release all checkpoints on the Hugging Face Model Hub. Additionally, the conversion script to integrate pretrained TEAMS into Transformers is included in this PR.

Closes #16466.

Implementation Details

TEAMS use the same architecture as ELECTRA (just pretraining approach is different). ELECTRA in Transformers comes with two models: Generator and Discriminator.

In contrast to ELECTRA, the TEAMS generator use shared layers with discriminator:

Our study confirms this observation and finds that sharing some transformer layers of the generator
and discriminator and can further boost the model performance. More specifically, we design the
generator to have the same “width” (i.e., hidden size, intermediate size and number of heads) as the
discriminator and share the bottom half of all transformer layers between the generator and the
discriminator.

More precisely, the sharing of layers can be seen in the reference implementation:

https://github.com/tensorflow/models/blob/master/official/projects/teams/teams_task.py#L48

This shows, that the generator uses the first n layers from discriminator first (which is usually half size of specified total layers).

Bildschirmfoto 2023-07-29 um 00 36 22

Retrieving TensorFlow 2 Checkpoints

In order to test the conversion script, the original TensorFlow 2 checkpoints need to be downloaded from Model Hub:

$ wget https://huggingface.co/gwlms/teams-base-dewiki-v1-generator/resolve/main/ckpt-1000000.data-00000-of-00001
$ wget https://huggingface.co/gwlms/teams-base-dewiki-v1-generator/resolve/main/ckpt-1000000.index

Additionally, to test the model locally, we need to download tokenizer:

$ wget https://huggingface.co/gwlms/teams-base-dewiki-v1-generator/resolve/main/tokenizer_config.json
$ wget https://huggingface.co/gwlms/teams-base-dewiki-v1-generator/resolve/main/vocab.txt

Converting TEAMS Generator

After retrieving the original checkpoints, the generator configuration must be downloaded:

$ mkdir generator && cd $_
$ wget https://huggingface.co/gwlms/teams-base-dewiki-v1-generator/resolve/main/config.json
$ cd ..

After that, the conversion script can be run to convert TEAMS (generator part) into ELECTRA generator:

$ python3 convert_teams_original_tf2_checkpoint_to_pytorch.py \
    --tf_checkpoint_path ckpt-1000000 \
    --config_file ./generator/config.json \
    --pytorch_dump_path ./exported-generator \
    --discriminator_or_generator generator
$ cp tokenizer_config.json exported-generator
$ cp vocab.txt exported-generator

The generator can be tested with masked lm pipeline to predict next work:

from transformers import pipeline

predictor = pipeline("fill-mask", model="./exported-generator", tokenizer="./exported-generator")
predictor("Die Hauptstadt von Finnland ist [MASK].")

The example German should predict the capital city of Finland, which is Helsinki:

[{'score': 0.971819281578064,
  'token': 16014,
  'token_str': 'Helsinki',
  'sequence': 'Die Hauptstadt von Finnland ist Helsinki.'},
 {'score': 0.006745012942701578,
  'token': 12388,
  'token_str': 'Stockholm',
  'sequence': 'Die Hauptstadt von Finnland ist Stockholm.'},
 {'score': 0.003258457174524665,
  'token': 12227,
  'token_str': 'Finnland',
  'sequence': 'Die Hauptstadt von Finnland ist Finnland.'},
 {'score': 0.0025941277854144573,
  'token': 23596,
  'token_str': 'Tallinn',
  'sequence': 'Die Hauptstadt von Finnland ist Tallinn.'},
 {'score': 0.0014661155873909593,
  'token': 17408,
  'token_str': 'Riga',
  'sequence': 'Die Hauptstadt von Finnland ist Riga.'}]

Converting TEAMS Discriminator

After retrieving the original checkpoints, the generator configuration must be downloaded:

$ mkdir discriminator && cd $_
$ wget https://huggingface.co/gwlms/teams-base-dewiki-v1-discriminator/resolve/main/config.json
$ cd ..

After that, the conversion script can be run to convert TEAMS (generator part) into ELECTRA generator:

$ python3 convert_teams_original_tf2_checkpoint_to_pytorch.py \
    --tf_checkpoint_path ckpt-1000000 \
    --config_file ./discriminator/config.json \
    --pytorch_dump_path ./exported-discriminator \
    --discriminator_or_generator discriminator

I made experiments on downstream tasks (such as NER or text classification) and the results are superior than to compared BERT models (original BERT and Token Dropping BERT).

Made with 🥨and ❤️.

@stefan-it stefan-it changed the title electra: add TEAMS TensorFlow 2 conversion script ELECTRA: Add TEAMS TensorFlow 2 conversion script Jul 28, 2023
@stefan-it stefan-it changed the title ELECTRA: Add TEAMS TensorFlow 2 conversion script TEAMS: Add TensorFlow 2 Model Garden Conversion Script Jul 28, 2023
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@amyeroberts
Copy link
Collaborator

cc @Rocketknight1

@stefan-it
Copy link
Collaborator Author

Please unstale 🤖

@huggingface huggingface deleted a comment from github-actions bot Aug 28, 2023
@Rocketknight1
Copy link
Member

No stale yet, please!

@huggingface huggingface deleted a comment from github-actions bot Sep 26, 2023
@stefan-it
Copy link
Collaborator Author

Please unstale bot 😄

@huggingface huggingface deleted a comment from github-actions bot Oct 23, 2023
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Nov 25, 2023
@stefan-it stefan-it reopened this Nov 25, 2023
@github-actions github-actions bot closed this Dec 4, 2023
@stefan-it stefan-it reopened this Dec 4, 2023
@github-actions github-actions bot closed this Dec 13, 2023
@amyeroberts amyeroberts reopened this Dec 13, 2023
@github-actions github-actions bot closed this Dec 22, 2023
@amyeroberts amyeroberts reopened this Dec 22, 2023
@github-actions github-actions bot closed this Dec 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TEAMS: Training ELECTRA Augmented with Multi-word Selection
4 participants