diff --git a/docs/source/es/converting_tensorflow_models.md b/docs/source/es/converting_tensorflow_models.md
index f56eb02d87006a..efd47e29dc29c6 100644
--- a/docs/source/es/converting_tensorflow_models.md
+++ b/docs/source/es/converting_tensorflow_models.md
@@ -89,7 +89,7 @@ Aquí hay un ejemplo del proceso para convertir un modelo OpenAI GPT-2 pre-entre
 ```bash
 export OPENAI_GPT2_CHECKPOINT_PATH=/path/to/openai-community/gpt2/pretrained/weights
 
-transformers-cli convert --model_type openai-community/gpt2 \
+transformers-cli convert --model_type gpt2 \
   --tf_checkpoint $OPENAI_GPT2_CHECKPOINT_PATH \
   --pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
   [--config OPENAI_GPT2_CONFIG] \
diff --git a/docs/source/it/converting_tensorflow_models.md b/docs/source/it/converting_tensorflow_models.md
index b1de0113388254..8280e58cc9208b 100644
--- a/docs/source/it/converting_tensorflow_models.md
+++ b/docs/source/it/converting_tensorflow_models.md
@@ -13,12 +13,12 @@ rendered properly in your Markdown viewer.
 
 # Convertire checkpoint di Tensorflow
 
-È disponibile un'interfaccia a linea di comando per convertire gli originali checkpoint di Bert/GPT/GPT-2/Transformer-XL/XLNet/XLM 
+È disponibile un'interfaccia a linea di comando per convertire gli originali checkpoint di Bert/GPT/GPT-2/Transformer-XL/XLNet/XLM
 in modelli che possono essere caricati utilizzando i metodi `from_pretrained` della libreria.
 
 <Tip>
 
-A partire dalla versione 2.3.0 lo script di conversione è parte di transformers CLI (**transformers-cli**), disponibile in ogni installazione 
+A partire dalla versione 2.3.0 lo script di conversione è parte di transformers CLI (**transformers-cli**), disponibile in ogni installazione
 di transformers >=2.3.0.
 
 La seguente documentazione riflette il formato dei comandi di **transformers-cli convert**.
@@ -27,19 +27,19 @@ La seguente documentazione riflette il formato dei comandi di **transformers-cli
 
 ## BERT
 
-Puoi convertire qualunque checkpoint Tensorflow di BERT (in particolare 
-[i modeli pre-allenati rilasciati da Google](https://github.com/google-research/bert#pre-trained-models)) 
-in un file di salvataggio Pytorch utilizzando lo script 
+Puoi convertire qualunque checkpoint Tensorflow di BERT (in particolare
+[i modeli pre-allenati rilasciati da Google](https://github.com/google-research/bert#pre-trained-models))
+in un file di salvataggio Pytorch utilizzando lo script
 [convert_bert_original_tf_checkpoint_to_pytorch.py](https://github.com/huggingface/transformers/tree/main/src/transformers/models/bert/convert_bert_original_tf_checkpoint_to_pytorch.py).
 
-Questo CLI prende come input un checkpoint di Tensorflow (tre files che iniziano con `bert_model.ckpt`) ed il relativo 
+Questo CLI prende come input un checkpoint di Tensorflow (tre files che iniziano con `bert_model.ckpt`) ed il relativo
 file di configurazione (`bert_config.json`), crea un modello Pytorch per questa configurazione, carica i pesi dal
-checkpoint di Tensorflow nel modello di Pytorch e salva il modello che ne risulta in un file di salvataggio standard di Pytorch che 
+checkpoint di Tensorflow nel modello di Pytorch e salva il modello che ne risulta in un file di salvataggio standard di Pytorch che
 può essere importato utilizzando `from_pretrained()` (vedi l'esempio nel
 [quicktour](quicktour) , [run_glue.py](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification/run_glue.py) ).
 
-Devi soltanto lanciare questo script di conversione **una volta** per ottenere un modello Pytorch. Dopodichè, potrai tralasciare 
-il checkpoint di Tensorflow (i tre files che iniziano con `bert_model.ckpt`), ma assicurati di tenere il file di configurazione 
+Devi soltanto lanciare questo script di conversione **una volta** per ottenere un modello Pytorch. Dopodichè, potrai tralasciare
+il checkpoint di Tensorflow (i tre files che iniziano con `bert_model.ckpt`), ma assicurati di tenere il file di configurazione
 (`bert_config.json`) ed il file di vocabolario (`vocab.txt`) in quanto queste componenti sono necessarie anche per il modello di Pytorch.
 
 Per lanciare questo specifico script di conversione avrai bisogno di un'installazione di Tensorflow e di Pytorch
@@ -59,11 +59,11 @@ Puoi scaricare i modelli pre-allenati di Google per la conversione [qua](https:/
 
 ## ALBERT
 
-Per il modello ALBERT, converti checkpoint di Tensoflow in Pytorch utilizzando lo script 
+Per il modello ALBERT, converti checkpoint di Tensoflow in Pytorch utilizzando lo script
 [convert_albert_original_tf_checkpoint_to_pytorch.py](https://github.com/huggingface/transformers/tree/main/src/transformers/models/albert/convert_albert_original_tf_checkpoint_to_pytorch.py).
 
-Il CLI prende come input un checkpoint di Tensorflow (tre files che iniziano con `model.ckpt-best`) e i relativi file di 
-configurazione (`albert_config.json`), dopodichè crea e salva un modello Pytorch. Per lanciare questa conversione 
+Il CLI prende come input un checkpoint di Tensorflow (tre files che iniziano con `model.ckpt-best`) e i relativi file di
+configurazione (`albert_config.json`), dopodichè crea e salva un modello Pytorch. Per lanciare questa conversione
 avrai bisogno di un'installazione di Tensorflow e di Pytorch.
 
 Ecco un esempio del procedimento di conversione di un modello `ALBERT Base` pre-allenato:
@@ -97,7 +97,7 @@ Ecco un esempio del processo di conversione di un modello OpenAI GPT-2 pre-allen
 
 ```bash
 export OPENAI_GPT2_CHECKPOINT_PATH=/path/to/openai-community/gpt2/pretrained/weights
-transformers-cli convert --model_type openai-community/gpt2 \
+transformers-cli convert --model_type gpt2 \
   --tf_checkpoint $OPENAI_GPT2_CHECKPOINT_PATH \
   --pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
   [--config OPENAI_GPT2_CONFIG] \
diff --git a/docs/source/pt/converting_tensorflow_models.md b/docs/source/pt/converting_tensorflow_models.md
index 190c1aec5b22bf..63e926d4dcfe6c 100644
--- a/docs/source/pt/converting_tensorflow_models.md
+++ b/docs/source/pt/converting_tensorflow_models.md
@@ -30,11 +30,11 @@ A documentação abaixo reflete o formato do comando **transformers-cli convert*
 
 ## BERT
 
-Você pode converter qualquer checkpoint do BERT em TensorFlow (em particular [os modelos pré-treinados lançados pelo Google](https://github.com/google-research/bert#pre-trained-models)) em um arquivo PyTorch usando um 
+Você pode converter qualquer checkpoint do BERT em TensorFlow (em particular [os modelos pré-treinados lançados pelo Google](https://github.com/google-research/bert#pre-trained-models)) em um arquivo PyTorch usando um
 [convert_bert_original_tf_checkpoint_to_pytorch.py](https://github.com/huggingface/transformers/tree/main/src/transformers/models/bert/convert_bert_original_tf_checkpoint_to_pytorch.py) script.
 
 Esta Interface de Linha de Comando (CLI) recebe como entrada um checkpoint do TensorFlow (três arquivos começando com `bert_model.ckpt`) e o
-arquivo de configuração (`bert_config.json`), e então cria um modelo PyTorch para esta configuração, carrega os pesos 
+arquivo de configuração (`bert_config.json`), e então cria um modelo PyTorch para esta configuração, carrega os pesos
 do checkpoint do TensorFlow no modelo PyTorch e salva o modelo resultante em um arquivo PyTorch que pode
 ser importado usando `from_pretrained()` (veja o exemplo em [quicktour](quicktour) , [run_glue.py](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification/run_glue.py) ).
 
@@ -102,7 +102,7 @@ Aqui está um exemplo do processo de conversão para um modelo OpenAI GPT-2 pré
 ```bash
 export OPENAI_GPT2_CHECKPOINT_PATH=/path/to/openai-community/gpt2/pretrained/weights
 
-transformers-cli convert --model_type openai-community/gpt2 \
+transformers-cli convert --model_type gpt2 \
   --tf_checkpoint $OPENAI_GPT2_CHECKPOINT_PATH \
   --pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
   [--config OPENAI_GPT2_CONFIG] \
diff --git a/examples/README.md b/examples/README.md
index a38b4576b35fd3..ac2cc048d13cec 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -17,7 +17,7 @@ limitations under the License.
 
 We host a wide range of example scripts for multiple learning frameworks. Simply choose your favorite: [TensorFlow](https://github.com/huggingface/transformers/tree/main/examples/tensorflow), [PyTorch](https://github.com/huggingface/transformers/tree/main/examples/pytorch) or [JAX/Flax](https://github.com/huggingface/transformers/tree/main/examples/flax).
 
-We also have some [research projects](https://github.com/huggingface/transformers/tree/main/examples/research_projects), as well as some [legacy examples](https://github.com/huggingface/transformers/tree/main/examples/legacy). Note that unlike the main examples these are not actively maintained, and may require specific older versions of dependencies in order to run. 
+We also have some [research projects](https://github.com/huggingface/transformers/tree/main/examples/research_projects), as well as some [legacy examples](https://github.com/huggingface/transformers/tree/main/examples/legacy). Note that unlike the main examples these are not actively maintained, and may require specific older versions of dependencies in order to run.
 
 While we strive to present as many use cases as possible, the example scripts are just that - examples. It is expected that they won't work out-of-the-box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs. To help you with that, most of the examples fully expose the preprocessing of the data, allowing you to tweak and edit them as required.
 
@@ -97,16 +97,16 @@ and run the example command as usual afterward.
 
 ## Running the Examples on Remote Hardware with Auto-Setup
 
-[run_on_remote.py](./run_on_remote.py) is a script that launches any example on remote self-hosted hardware, 
-with automatic hardware and environment setup. It uses [Runhouse](https://github.com/run-house/runhouse) to launch 
-on self-hosted hardware (e.g. in your own cloud account or on-premise cluster) but there are other options 
-for running remotely as well. You can easily customize the example used, command line arguments, dependencies, 
+[run_on_remote.py](./run_on_remote.py) is a script that launches any example on remote self-hosted hardware,
+with automatic hardware and environment setup. It uses [Runhouse](https://github.com/run-house/runhouse) to launch
+on self-hosted hardware (e.g. in your own cloud account or on-premise cluster) but there are other options
+for running remotely as well. You can easily customize the example used, command line arguments, dependencies,
 and type of compute hardware, and then run the script to automatically launch the example.
 
-You can refer to 
+You can refer to
 [hardware setup](https://runhouse-docs.readthedocs-hosted.com/en/latest/api/python/cluster.html#hardware-setup)
 for more information about hardware and dependency setup with Runhouse, or this
-[Colab tutorial](https://colab.research.google.com/drive/1sh_aNQzJX5BKAdNeXthTNGxKz7sM9VPc) for a more in-depth 
+[Colab tutorial](https://colab.research.google.com/drive/1sh_aNQzJX5BKAdNeXthTNGxKz7sM9VPc) for a more in-depth
 walkthrough.
 
 You can run the script with the following commands:
@@ -118,7 +118,7 @@ pip install runhouse
 # For an on-demand V100 with whichever cloud provider you have configured:
 python run_on_remote.py \
     --example pytorch/text-generation/run_generation.py \
-    --model_type=openai-community/gpt2 \
+    --model_type=gpt2 \
     --model_name_or_path=openai-community/gpt2 \
     --prompt "I am a language model and"
 
diff --git a/examples/flax/language-modeling/README.md b/examples/flax/language-modeling/README.md
index 324c560ea4a7f3..9b95d9ec0911bd 100644
--- a/examples/flax/language-modeling/README.md
+++ b/examples/flax/language-modeling/README.md
@@ -16,7 +16,7 @@ limitations under the License.
 
 # Language model training examples
 
-The following example showcases how to train a language model from scratch 
+The following example showcases how to train a language model from scratch
 using the JAX/Flax backend.
 
 JAX/Flax allows you to trace pure functions and compile them into efficient, fused accelerator code on both GPU and TPU.
@@ -25,9 +25,9 @@ way which enables simple and efficient model parallelism.
 
 ## Masked language modeling
 
-In the following, we demonstrate how to train a bi-directional transformer model 
+In the following, we demonstrate how to train a bi-directional transformer model
 using masked language modeling objective as introduced in [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).
-More specifically, we demonstrate how JAX/Flax can be leveraged 
+More specifically, we demonstrate how JAX/Flax can be leveraged
 to pre-train [**`FacebookAI/roberta-base`**](https://huggingface.co/FacebookAI/roberta-base)
 in Norwegian on a single TPUv3-8 pod.
 
@@ -75,7 +75,7 @@ tokenizer.save("./norwegian-roberta-base/tokenizer.json")
 
 ### Create configuration
 
-Next, we create the model's configuration file. This is as simple 
+Next, we create the model's configuration file. This is as simple
 as loading and storing [`**FacebookAI/roberta-base**`](https://huggingface.co/FacebookAI/roberta-base)
 in the local model folder:
 
@@ -117,17 +117,17 @@ python run_mlm_flax.py \
     --push_to_hub
 ```
 
-Training should converge at a loss and accuracy 
+Training should converge at a loss and accuracy
 of 1.78 and 0.64 respectively after 18 epochs on a single TPUv3-8.
 This should take less than 18 hours.
 Training statistics can be accessed on [tfhub.dev](https://tensorboard.dev/experiment/GdYmdak2TWeVz0DDRYOrrg).
 
-For a step-by-step walkthrough of how to do masked language modeling in Flax, please have a 
+For a step-by-step walkthrough of how to do masked language modeling in Flax, please have a
 look at [this](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/masked_language_modeling_flax.ipynb) google colab.
 
 ## Causal language modeling
 
-In the following, we demonstrate how to train an auto-regressive causal transformer model 
+In the following, we demonstrate how to train an auto-regressive causal transformer model
 in JAX/Flax.
 More specifically, we pretrain a randomly initialized [**`openai-community/gpt2`**](https://huggingface.co/openai-community/gpt2) model in Norwegian on a single TPUv3-8.
 to pre-train 124M [**`openai-community/gpt2`**](https://huggingface.co/openai-community/gpt2)
@@ -178,7 +178,7 @@ tokenizer.save("./norwegian-gpt2/tokenizer.json")
 
 ### Create configuration
 
-Next, we create the model's configuration file. This is as simple 
+Next, we create the model's configuration file. This is as simple
 as loading and storing [`**openai-community/gpt2**`](https://huggingface.co/openai-community/gpt2)
 in the local model folder:
 
@@ -199,7 +199,7 @@ Finally, we can run the example script to pretrain the model:
 ```bash
 python run_clm_flax.py \
     --output_dir="./norwegian-gpt2" \
-    --model_type="openai-community/gpt2" \
+    --model_type="gpt2" \
     --config_name="./norwegian-gpt2" \
     --tokenizer_name="./norwegian-gpt2" \
     --dataset_name="oscar" \
@@ -218,19 +218,19 @@ python run_clm_flax.py \
     --push_to_hub
 ```
 
-Training should converge at a loss and perplexity 
+Training should converge at a loss and perplexity
 of 3.24 and 25.72 respectively after 20 epochs on a single TPUv3-8.
 This should take less than ~21 hours.
 Training statistics can be accessed on [tfhub.de](https://tensorboard.dev/experiment/2zEhLwJ0Qp2FAkI3WVH9qA).
 
-For a step-by-step walkthrough of how to do causal language modeling in Flax, please have a 
+For a step-by-step walkthrough of how to do causal language modeling in Flax, please have a
 look at [this](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/causal_language_modeling_flax.ipynb) google colab.
 
 ## T5-like span-masked language modeling
 
-In the following, we demonstrate how to train a T5 model using the span-masked language model 
+In the following, we demonstrate how to train a T5 model using the span-masked language model
 objective as proposed in the [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683).
-More specifically, we demonstrate how JAX/Flax can be leveraged 
+More specifically, we demonstrate how JAX/Flax can be leveraged
 to pre-train [**`google/t5-v1_1-base`**](https://huggingface.co/google/t5-v1_1-base)
 in Norwegian on a single TPUv3-8 pod.
 
@@ -247,9 +247,9 @@ cd ./norwegian-t5-base
 
 ### Train tokenizer
 
-In the first step, we train a tokenizer to efficiently process the text input for the model. 
-We make use of the [tokenizers](https://github.com/huggingface/tokenizers) library to train 
-a sentencepiece unigram tokenizer as shown in [t5_tokenizer_model.py](https://github.com/huggingface/transformers/tree/main/examples/flax/language-modeling/t5_tokenizer_model.py) 
+In the first step, we train a tokenizer to efficiently process the text input for the model.
+We make use of the [tokenizers](https://github.com/huggingface/tokenizers) library to train
+a sentencepiece unigram tokenizer as shown in [t5_tokenizer_model.py](https://github.com/huggingface/transformers/tree/main/examples/flax/language-modeling/t5_tokenizer_model.py)
 which is heavily inspired from [yandex-research/DeDLOC's tokenizer model](https://github.com/yandex-research/DeDLOC/blob/5c994bc64e573702a9a79add3ecd68b38f14b548/sahajbert/tokenizer/tokenizer_model.py) .
 
 The tokenizer is trained on the complete Norwegian dataset of OSCAR
@@ -293,7 +293,7 @@ tokenizer.save("./norwegian-t5-base/tokenizer.json")
 
 ### Create configuration
 
-Next, we create the model's configuration file. This is as simple 
+Next, we create the model's configuration file. This is as simple
 as loading and storing [`**google/t5-v1_1-base**`](https://huggingface.co/google/t5-v1_1-base)
 in the local model folder:
 
@@ -333,16 +333,16 @@ python run_t5_mlm_flax.py \
 	--push_to_hub
 ```
 
-Training should converge at a loss and accuracy 
+Training should converge at a loss and accuracy
 of 2.36 and 57.0 respectively after 3 epochs on a single TPUv3-8.
 This should take around 4.5 hours.
 Training statistics can be accessed on directly on the 🤗 [hub](https://huggingface.co/patrickvonplaten/t5-base-norwegian/tensorboard)
 
 ## BART: Denoising language modeling
 
-In the following, we demonstrate how to train a BART model 
+In the following, we demonstrate how to train a BART model
 using denoising language modeling objective as introduced in [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461).
-More specifically, we demonstrate how JAX/Flax can be leveraged 
+More specifically, we demonstrate how JAX/Flax can be leveraged
 to pre-train [**`bart-base`**](https://huggingface.co/facebook/bart-base)
 in Norwegian on a single TPUv3-8 pod.
 
@@ -389,7 +389,7 @@ tokenizer.save("./norwegian-bart-base/tokenizer.json")
 
 ### Create configuration
 
-Next, we create the model's configuration file. This is as simple 
+Next, we create the model's configuration file. This is as simple
 as loading and storing [`**facebook/bart-base**`](https://huggingface.co/facebook/bart-base)
 in the local model folder:
 
@@ -425,7 +425,7 @@ python run_bart_dlm_flax.py \
     --push_to_hub
 ```
 
-Training should converge at a loss and accuracy 
+Training should converge at a loss and accuracy
 of 1.36 and 0.77 respectively after 3 epochs on a single TPUv3-8.
 This should take less than 6 hours.
 Training statistics can be accessed on [tfhub.dev](https://tensorboard.dev/experiment/Maw62QlaSXWS0MOf2V2lbg/).
@@ -440,14 +440,14 @@ For reproducibility, we state the training commands used for PyTorch/XLA and PyT
 |-------|-----------|------------|------------|
 | MLM   |  15h32m   |  23h46m    | 44h14m     |
 
-*All experiments are ran on Google Cloud Platform. 
+*All experiments are ran on Google Cloud Platform.
 GPU experiments are ran without further optimizations besides JAX
 transformations. GPU experiments are ran with full precision (fp32). "TPU v3-8"
 are 8 TPU cores on 4 chips (each chips has 2 cores), while "8 GPU" are 8 GPU chips.
 
 ### Script to run MLM with PyTorch/XLA on TPUv3-8
 
-For comparison one can run the same pre-training with PyTorch/XLA on TPU. To set up PyTorch/XLA on Cloud TPU VMs, please 
+For comparison one can run the same pre-training with PyTorch/XLA on TPU. To set up PyTorch/XLA on Cloud TPU VMs, please
 refer to [this](https://cloud.google.com/tpu/docs/pytorch-xla-ug-tpu-vm) guide.
 Having created the tokenizer and configuration in `norwegian-roberta-base`, we create the following symbolic links:
 
@@ -497,7 +497,7 @@ python3 xla_spawn.py --num_cores ${NUM_TPUS} run_mlm.py --output_dir="./runs" \
 
 ### Script to compare pre-training with PyTorch on 8 GPU V100's
 
-For comparison you can run the same pre-training with PyTorch on GPU. Note that we have to make use of `gradient_accumulation` 
+For comparison you can run the same pre-training with PyTorch on GPU. Note that we have to make use of `gradient_accumulation`
 because the maximum batch size that fits on a single V100 GPU is 32 instead of 128.
 Having created the tokenizer and configuration in `norwegian-roberta-base`, we create the following symbolic links:
 
diff --git a/examples/pytorch/language-modeling/README.md b/examples/pytorch/language-modeling/README.md
index 3a209584acc522..b13cebde5f5796 100644
--- a/examples/pytorch/language-modeling/README.md
+++ b/examples/pytorch/language-modeling/README.md
@@ -239,7 +239,7 @@ When training a model from scratch, configuration values may be overridden with
 
 
 ```bash
-python run_clm.py --model_type openai-community/gpt2 --tokenizer_name openai-community/gpt2 \ --config_overrides="n_embd=1024,n_head=16,n_layer=48,n_positions=102" \
+python run_clm.py --model_type gpt2 --tokenizer_name openai-community/gpt2 \ --config_overrides="n_embd=1024,n_head=16,n_layer=48,n_positions=102" \
 [...]
 ```
 
diff --git a/examples/pytorch/text-generation/README.md b/examples/pytorch/text-generation/README.md
index e619c25e162d52..72fc25e13c65e7 100644
--- a/examples/pytorch/text-generation/README.md
+++ b/examples/pytorch/text-generation/README.md
@@ -26,6 +26,6 @@ Example usage:
 
 ```bash
 python run_generation.py \
-    --model_type=openai-community/gpt2 \
+    --model_type=gpt2 \
     --model_name_or_path=openai-community/gpt2
 ```