From 5adfeaccf9a70cf8ad25eb0d3a0826a6665ac8d2 Mon Sep 17 00:00:00 2001
From: Diana Liskovich <dianaml@devfair0471.h2.fair>
Date: Mon, 20 Sep 2021 08:04:06 -0700
Subject: [PATCH] Rename references from master -> main in preparation for
 branch name change (#2297)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2297

Reviewed By: alexeib

Differential Revision: D30906090

Pulled By: dianaml0

fbshipit-source-id: 941d30db7f766c9077a1b5bb2a04680f57e2e070
---
 .github/ISSUE_TEMPLATE/bug_report.md          |  4 ++--
 .github/ISSUE_TEMPLATE/how-to-question.md     | 10 +++++-----
 .github/PULL_REQUEST_TEMPLATE.md              | 10 +++++-----
 .github/workflows/build.yml                   |  4 ++--
 CONTRIBUTING.md                               |  2 +-
 README.md                                     |  6 +++---
 docs/conf.py                                  |  2 +-
 examples/adaptive_span/README.md              |  2 +-
 examples/constrained_decoding/README.md       |  2 +-
 .../discriminative_reranking_nmt/README.md    |  2 +-
 examples/fast_noisy_channel/README.md         |  4 ++--
 examples/layerdrop/README.md                  |  6 +++---
 examples/m2m_100/README.md                    |  2 +-
 examples/multilingual/README.md               |  6 +++---
 examples/quant_noise/README.md                | 20 +++++++++----------
 examples/roberta/README.md                    |  8 ++++----
 examples/roberta/commonsense_qa/README.md     |  2 +-
 examples/shuffled_word_order/README.md        |  6 +++---
 .../speech_synthesis/docs/ljspeech_example.md |  4 ++--
 examples/textless_nlp/gslm/README.md          |  4 ++--
 examples/wav2vec/unsupervised/README.md       |  4 ++--
 fairseq/models/bart/hub_interface.py          |  2 +-
 fairseq/models/roberta/hub_interface.py       |  2 +-
 23 files changed, 57 insertions(+), 57 deletions(-)

diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
index a7f4f0a902..aa15123d8e 100644
--- a/.github/ISSUE_TEMPLATE/bug_report.md
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -19,7 +19,7 @@ Steps to reproduce the behavior (**always include the command you ran**):
 
 
 #### Code sample
-<!-- Ideally attach a minimal code sample to reproduce the decried issue. 
+<!-- Ideally attach a minimal code sample to reproduce the decried issue.
 Minimal means having the shortest code but still preserving the bug. -->
 
 ### Expected behavior
@@ -28,7 +28,7 @@ Minimal means having the shortest code but still preserving the bug. -->
 
 ### Environment
 
- - fairseq Version (e.g., 1.0 or master):
+ - fairseq Version (e.g., 1.0 or main):
  - PyTorch Version (e.g., 1.0)
  - OS (e.g., Linux):
  - How you installed fairseq (`pip`, source):
diff --git a/.github/ISSUE_TEMPLATE/how-to-question.md b/.github/ISSUE_TEMPLATE/how-to-question.md
index 4beb180dbf..04f3f15d3e 100644
--- a/.github/ISSUE_TEMPLATE/how-to-question.md
+++ b/.github/ISSUE_TEMPLATE/how-to-question.md
@@ -6,9 +6,9 @@ labels: 'question, needs triage'
 
 ## ❓ Questions and Help
 
-### Before asking:   
-1. search the issues.   
-2. search the docs.    
+### Before asking:
+1. search the issues.
+2. search the docs.
 
 <!-- If you still can't find what you need: -->
 
@@ -16,13 +16,13 @@ labels: 'question, needs triage'
 
 #### Code
 
-<!-- Please paste a code snippet if your question requires it! -->   
+<!-- Please paste a code snippet if your question requires it! -->
 
 #### What have you tried?
 
 #### What's your environment?
 
- - fairseq Version (e.g., 1.0 or master):
+ - fairseq Version (e.g., 1.0 or main):
  - PyTorch Version (e.g., 1.0)
  - OS (e.g., Linux):
  - How you installed fairseq (`pip`, source):
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
index b28ff98e7b..d005e2df4f 100644
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -1,15 +1,15 @@
 # Before submitting
 
 - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
-- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
-- [ ] Did you make sure to update the docs?   
-- [ ] Did you write any new necessary tests?  
+- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
+- [ ] Did you make sure to update the docs?
+- [ ] Did you write any new necessary tests?
 
 ## What does this PR do?
 Fixes # (issue).
 
-## PR review    
-Anyone in the community is free to review the PR once the tests have passed.     
+## PR review
+Anyone in the community is free to review the PR once the tests have passed.
 If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
 
 ## Did you have fun?
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index 105c42a503..f493f91f0d 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -1,10 +1,10 @@
 name: build
 
 on:
-  # Trigger the workflow on push to master or any pull request
+  # Trigger the workflow on push to main or any pull request
   push:
     branches:
-      - master
+      - main
   pull_request:
 
 jobs:
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 4d7ca6a98e..3930c46196 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -5,7 +5,7 @@ possible.
 ## Pull Requests
 We actively welcome your pull requests.
 
-1. Fork the repo and create your branch from `master`.
+1. Fork the repo and create your branch from `main`.
 2. If you've added code that should be tested, add tests.
 3. If you've changed APIs, update the documentation.
 4. Ensure the test suite passes.
diff --git a/README.md b/README.md
index 3316c963ce..dd68717480 100644
--- a/README.md
+++ b/README.md
@@ -2,7 +2,7 @@
   <img src="docs/fairseq_logo.png" width="150">
   <br />
   <br />
-  <a href="https://github.com/pytorch/fairseq/blob/master/LICENSE"><img alt="MIT License" src="https://img.shields.io/badge/license-MIT-blue.svg" /></a>
+  <a href="https://github.com/pytorch/fairseq/blob/main/LICENSE"><img alt="MIT License" src="https://img.shields.io/badge/license-MIT-blue.svg" /></a>
   <a href="https://github.com/pytorch/fairseq/releases"><img alt="Latest Release" src="https://img.shields.io/github/release/pytorch/fairseq.svg" /></a>
   <a href="https://github.com/pytorch/fairseq/actions?query=workflow:build"><img alt="Build Status" src="https://github.com/pytorch/fairseq/workflows/build/badge.svg" /></a>
   <a href="https://fairseq.readthedocs.io/en/latest/?badge=latest"><img alt="Documentation Status" src="https://readthedocs.org/projects/fairseq/badge/?version=latest" /></a>
@@ -48,7 +48,7 @@ We provide reference implementations of various sequence modeling papers:
   + [Linformer: Self-Attention with Linear Complexity (Wang et al., 2020)](examples/linformer/README.md)
   + [Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)](examples/criss/README.md)
   + [Deep Transformers with Latent Depth (Li et al., 2020)](examples/latent_depth/README.md)
-  + [Unsupervised Cross-lingual Representation Learning for Speech Recognition (Conneau et al., 2020)](https://arxiv.org/abs/2006.13979) 
+  + [Unsupervised Cross-lingual Representation Learning for Speech Recognition (Conneau et al., 2020)](https://arxiv.org/abs/2006.13979)
   + [Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training (Hsu, et al., 2021)](https://arxiv.org/abs/2104.01027)
   + [Unsupervised Speech Recognition (Baevski, et al., 2021)](https://arxiv.org/abs/2105.11084)
 * **Non-autoregressive Transformers**
@@ -93,7 +93,7 @@ We provide reference implementations of various sequence modeling papers:
 * April 2020: [Initial model parallel support and 11B parameters unidirectional LM released](examples/megatron_11b/README.md)
 * March 2020: [Byte-level BPE code released](examples/byte_level_bpe/README.md)
 * February 2020: [mBART model and code released](examples/mbart/README.md)
-* February 2020: [Added tutorial for back-translation](https://github.com/pytorch/fairseq/tree/master/examples/backtranslation#training-your-own-model-wmt18-english-german)
+* February 2020: [Added tutorial for back-translation](https://github.com/pytorch/fairseq/tree/main/examples/backtranslation#training-your-own-model-wmt18-english-german)
 * December 2019: [fairseq 0.9.0 released](https://github.com/pytorch/fairseq/releases/tag/v0.9.0)
 * November 2019: [VizSeq released (a visual analysis toolkit for evaluating fairseq models)](https://facebookresearch.github.io/vizseq/docs/getting_started/fairseq_example)
 * November 2019: [CamemBERT model and code released](examples/camembert/README.md)
diff --git a/docs/conf.py b/docs/conf.py
index 440784bfae..87b0db98c7 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -55,7 +55,7 @@
 copyright = "Facebook AI Research (FAIR)"
 author = "Facebook AI Research (FAIR)"
 
-github_doc_root = "https://github.com/pytorch/fairseq/tree/master/docs/"
+github_doc_root = "https://github.com/pytorch/fairseq/tree/main/docs/"
 
 # The version info for the project you're documenting, acts as replacement for
 # |version| and |release|, also used in various other places throughout the
diff --git a/examples/adaptive_span/README.md b/examples/adaptive_span/README.md
index 913a873386..d5224fb289 100644
--- a/examples/adaptive_span/README.md
+++ b/examples/adaptive_span/README.md
@@ -4,7 +4,7 @@ Adaptive Span is a novel self-attention mechanism that can learn its optimal
 attention span. This allows us to extend significantly the maximum context size
 used in Transformer, while maintaining control over their memory footprint
 and computational time. It uses the Truncated BPTT technique for training,
-as in [transformerXL](https://github.com/pytorch/fairseq/blob/master/examples/truncated_bptt/README.md).
+as in [transformerXL](https://github.com/pytorch/fairseq/blob/main/examples/truncated_bptt/README.md).
 
 Adaptive Span was introduced by paper:
 [Adaptive Attention Span in Transformers](https://arxiv.org/abs/1905.07799),
diff --git a/examples/constrained_decoding/README.md b/examples/constrained_decoding/README.md
index cfca9c91fd..e04b8b6a01 100644
--- a/examples/constrained_decoding/README.md
+++ b/examples/constrained_decoding/README.md
@@ -12,7 +12,7 @@ Constrained search is enabled by adding the command-line argument `--constraints
 Constraints are appended to each line of input, separated by tabs. Each constraint (one or more tokens)
 is a separate field.
 
-The following command, using [Fairseq's WMT19 German--English model](https://github.com/pytorch/fairseq/blob/master/examples/wmt19/README.md),
+The following command, using [Fairseq's WMT19 German--English model](https://github.com/pytorch/fairseq/blob/main/examples/wmt19/README.md),
 translates the sentence *Die maschinelle Übersetzung ist schwer zu kontrollieren.* with the constraints
 "hard" and "to influence".
 
diff --git a/examples/discriminative_reranking_nmt/README.md b/examples/discriminative_reranking_nmt/README.md
index e6f42b1278..b155e855f2 100644
--- a/examples/discriminative_reranking_nmt/README.md
+++ b/examples/discriminative_reranking_nmt/README.md
@@ -38,7 +38,7 @@ source_sentence_L_hypo_1
 source_sentence_L_hypo_N
 ```
 
-2. Download the [XLMR model](https://github.com/fairinternal/fairseq-py/tree/master/examples/xlmr#pre-trained-models).
+2. Download the [XLMR model](https://github.com/fairinternal/fairseq-py/tree/main/examples/xlmr#pre-trained-models).
 ```
 wget https://dl.fbaipublicfiles.com/fairseq/models/xlmr.base.tar.gz
 tar zxvf xlmr.base.tar.gz
diff --git a/examples/fast_noisy_channel/README.md b/examples/fast_noisy_channel/README.md
index a04151a796..f2631a8c34 100644
--- a/examples/fast_noisy_channel/README.md
+++ b/examples/fast_noisy_channel/README.md
@@ -29,9 +29,9 @@ This framework provides a great way to utlize strong target language models trai
 
 ### Training Translation Models and Language Models
 
-For training Transformer models in fairseq for machine translation, refer to instructions [here](https://github.com/pytorch/fairseq/tree/master/examples/translation)
+For training Transformer models in fairseq for machine translation, refer to instructions [here](https://github.com/pytorch/fairseq/tree/main/examples/translation)
 
-For training Transformer models in fairseq for language modeling, refer to instructions [here](https://github.com/pytorch/fairseq/tree/master/examples/language_model)
+For training Transformer models in fairseq for language modeling, refer to instructions [here](https://github.com/pytorch/fairseq/tree/main/examples/language_model)
 
 ### Generation with Language Model for German-English translation with fairseq
 
diff --git a/examples/layerdrop/README.md b/examples/layerdrop/README.md
index 394e710b0f..4d48ee9615 100644
--- a/examples/layerdrop/README.md
+++ b/examples/layerdrop/README.md
@@ -126,9 +126,9 @@ This model override command overrides the training parameters and updates the mo
 
 Looking to reproduce the results in the paper?
 
-1. For Translation on WMT16 en-de, we followed this setting [here](https://github.com/pytorch/fairseq/blob/master/examples/scaling_nmt/README.md)
-2. To train RoBERTa, we followed this setting [here](https://github.com/pytorch/fairseq/tree/master/examples/roberta)
-3. To train Language Models on Wikitext-103, we followed this setting [here](https://github.com/pytorch/fairseq/tree/master/examples/language_model)
+1. For Translation on WMT16 en-de, we followed this setting [here](https://github.com/pytorch/fairseq/blob/main/examples/scaling_nmt/README.md)
+2. To train RoBERTa, we followed this setting [here](https://github.com/pytorch/fairseq/tree/main/examples/roberta)
+3. To train Language Models on Wikitext-103, we followed this setting [here](https://github.com/pytorch/fairseq/tree/main/examples/language_model)
 
 
 ## Tips
diff --git a/examples/m2m_100/README.md b/examples/m2m_100/README.md
index 05801584d6..02a68a5f09 100644
--- a/examples/m2m_100/README.md
+++ b/examples/m2m_100/README.md
@@ -82,7 +82,7 @@ fairseq-preprocess \
 
 3. **Training Scripts**
 
-To reproduce the training of our models, we train with fairseq-py's multilingual translation [task](https://github.com/pytorch/fairseq/tree/master/examples/multilingual). If you are interested in model parallel training, also check out [fairscale](https://github.com/facebookresearch/fairscale).
+To reproduce the training of our models, we train with fairseq-py's multilingual translation [task](https://github.com/pytorch/fairseq/tree/main/examples/multilingual). If you are interested in model parallel training, also check out [fairscale](https://github.com/facebookresearch/fairscale).
 
 4. **Generation**
 
diff --git a/examples/multilingual/README.md b/examples/multilingual/README.md
index 0076f5e8f0..46ff9c351b 100644
--- a/examples/multilingual/README.md
+++ b/examples/multilingual/README.md
@@ -17,9 +17,9 @@ This work is for training multilingual translation models with multiple bitext d
   - --finetune-from-model to specify the path from which to load the pretrained model
 
 ## Preprocessing data
-Multilingual training requires a joint BPE vocab. Please follow [mBART's preprocessing steps](https://github.com/pytorch/fairseq/tree/master/examples/mbart#bpe-data) to reuse our pretrained sentence-piece model.
+Multilingual training requires a joint BPE vocab. Please follow [mBART's preprocessing steps](https://github.com/pytorch/fairseq/tree/main/examples/mbart#bpe-data) to reuse our pretrained sentence-piece model.
 
-You can also train a joint BPE model on your own dataset and then follow the steps in [[link]](https://github.com/pytorch/fairseq/tree/master/examples/translation#multilingual-translation).
+You can also train a joint BPE model on your own dataset and then follow the steps in [[link]](https://github.com/pytorch/fairseq/tree/main/examples/translation#multilingual-translation).
 
 ## Training
 
@@ -49,7 +49,7 @@ fairseq-train $path_2_data \
 ```
 
 ## Finetuning
-We can also finetune multilingual models from a monolingual pretrained models, e.g. [mMBART](https://github.com/pytorch/fairseq/tree/master/examples/mbart).
+We can also finetune multilingual models from a monolingual pretrained models, e.g. [mMBART](https://github.com/pytorch/fairseq/tree/main/examples/mbart).
 ```bash
 lang_pairs=<language pairs to be trained, e.g. "en-cs,cs-en">
 path_2_data=<set to data path>
diff --git a/examples/quant_noise/README.md b/examples/quant_noise/README.md
index 539c3d5af9..a04d7e4e8a 100644
--- a/examples/quant_noise/README.md
+++ b/examples/quant_noise/README.md
@@ -33,7 +33,7 @@ Unlike the section [Iterative Product Quantization](#iterative-product-quantizat
 
 #### Training
 
-Scalar quantization with Quant-Noise consists in randomly quantizing a proportion `p` of the weights during training. Scalar quantization is implemented [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quantization/scalar) under the form of Fake Quantization, meaning that we emulate int8 on GPU by quantizing and de-quantizing both the weights and the activations. We rely on PyTorch's [quantization primitives](https://github.com/pytorch/pytorch/tree/master/torch/quantization).
+Scalar quantization with Quant-Noise consists in randomly quantizing a proportion `p` of the weights during training. Scalar quantization is implemented [here](https://github.com/pytorch/fairseq/tree/main/fairseq/modules/quantization/scalar) under the form of Fake Quantization, meaning that we emulate int8 on GPU by quantizing and de-quantizing both the weights and the activations. We rely on PyTorch's [quantization primitives](https://github.com/pytorch/pytorch/tree/master/torch/quantization).
 
 To train a model with Quant-Noise, add the following flag:
 ```
@@ -49,7 +49,7 @@ When evaluating a network, all quantized modules and activation hooks automatica
 #### Integration with your own code
 
 Looking to quantize your own models with Quant-Noise + Scalar Quantization?
-- Use the function `quantize_model_` implemented [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quantization/scalar/utils.py) to (1) replace all your modules by their quantized counterparts and (2) add hooks to those modules to quantize the activations.
+- Use the function `quantize_model_` implemented [here](https://github.com/pytorch/fairseq/tree/main/fairseq/modules/quantization/scalar/utils.py) to (1) replace all your modules by their quantized counterparts and (2) add hooks to those modules to quantize the activations.
 - Then, perform your training as usual. Note that in `eval()` mode, the network is always fully quantized (weights and activations) by default (`p=1`).
 
 
@@ -66,12 +66,12 @@ To train a model with Quant-Noise, add the following flags:
 --quant-noise-pq 0.1 --quant-noise-pq-block-size 8
 ```
 `quant-noise-pq` controls how much dropout is applied to the blocks of the weight matrix. `quant-noise-pq-block-size` controls the size of the weight matrix blocks.
-We recommend training with 0.05 to 0.2 Quant-Noise, a value that worked well in our experiments. For the block-size, we recommend training with block-size of 8. Note that the block size must be a multiple of `input_features`, see the size checks [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quant_noise.py). Large block sizes result in higher compression ratio but may induce a loss in accuracy.
+We recommend training with 0.05 to 0.2 Quant-Noise, a value that worked well in our experiments. For the block-size, we recommend training with block-size of 8. Note that the block size must be a multiple of `input_features`, see the size checks [here](https://github.com/pytorch/fairseq/tree/main/fairseq/modules/quant_noise.py). Large block sizes result in higher compression ratio but may induce a loss in accuracy.
 
-We currently support training Transformer based models, such as sequence-to-sequence, language models, and BERT architectures. The `quant_noise` function [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quant_noise.py) wraps a module. It splits a weight matrix into blocks and applies random dropout to these blocks.
+We currently support training Transformer based models, such as sequence-to-sequence, language models, and BERT architectures. The `quant_noise` function [here](https://github.com/pytorch/fairseq/tree/main/fairseq/modules/quant_noise.py) wraps a module. It splits a weight matrix into blocks and applies random dropout to these blocks.
 In the Transformer architectures, quant-noise is applied to the input and output embeddings, the attention, and the FFN.
 
-Quant-Noise can also be combined with **LayerDrop** (see [here](https://github.com/pytorch/fairseq/tree/master/examples/layerdrop)) to add its pruning effect to the quantized model and make the model even smaller. We recommend training with LayerDrop 0.1 or 0.2.
+Quant-Noise can also be combined with **LayerDrop** (see [here](https://github.com/pytorch/fairseq/tree/main/examples/layerdrop)) to add its pruning effect to the quantized model and make the model even smaller. We recommend training with LayerDrop 0.1 or 0.2.
 
 #### Quantization
 
@@ -84,8 +84,8 @@ For the particular case of PQ, quantization is made sequentially. We recommend f
 #### Integration with your own code
 
 Looking to quantize your own models with Quant-Noise + iPQ?
-- First wrap your modules with the `quant_noise` function [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quant_noise.py), which is module-agnostic and train your favorite model.
-- Then, quantize your trained model using the code [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quantization/pq). This can be done *without any changes to your training loop*. Below is an example code for integration.
+- First wrap your modules with the `quant_noise` function [here](https://github.com/pytorch/fairseq/tree/main/fairseq/modules/quant_noise.py), which is module-agnostic and train your favorite model.
+- Then, quantize your trained model using the code [here](https://github.com/pytorch/fairseq/tree/main/fairseq/modules/quantization/pq). This can be done *without any changes to your training loop*. Below is an example code for integration.
 Note that we tried our approach only on Transformers and various Convolutional Models such as EfficientNets.
 
 ```python
@@ -128,7 +128,7 @@ We detail below how to reproduce the state-of-the-art results in reported in the
 
 ### Training with Quant-Noise
 
-To **train** RoBERTa + QuantNoise, we followed this setting [here](https://github.com/pytorch/fairseq/tree/master/examples/roberta).
+To **train** RoBERTa + QuantNoise, we followed this setting [here](https://github.com/pytorch/fairseq/tree/main/examples/roberta).
 The following command can be used to train a RoBERTa Base + QuantNoise model:
 
 ```bash
@@ -158,7 +158,7 @@ fairseq-train $DATA_DIR \
     --quant-noise-pq 0.2 --quant-noise-pq-block-size 8 --untie-weights-roberta
 ```
 
-To **finetune** RoBERTa + QuantNoise, we followed this setting [here](https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.glue.md).
+To **finetune** RoBERTa + QuantNoise, we followed this setting [here](https://github.com/pytorch/fairseq/blob/main/examples/roberta/README.glue.md).
 The following command can be used to finetune a RoBERTa Base + QuantNoise model on the RTE dataset:
 
 ```bash
@@ -193,7 +193,7 @@ fairseq-train /path/to/rte/data/ \
     --quant-noise-pq 0.2 --quant-noise-pq-block-size 8
 ```
 
-To **train** Language Models on Wikitext-103, we followed this setting [here](https://github.com/pytorch/fairseq/tree/master/examples/language_model).
+To **train** Language Models on Wikitext-103, we followed this setting [here](https://github.com/pytorch/fairseq/tree/main/examples/language_model).
 The following command can be used to train a Transformer + QuantNoise model on Wikitext-103:
 
 ```bash
diff --git a/examples/roberta/README.md b/examples/roberta/README.md
index 58091b2c7d..ed4d5df52c 100644
--- a/examples/roberta/README.md
+++ b/examples/roberta/README.md
@@ -8,13 +8,13 @@ RoBERTa iterates on BERT's pretraining procedure, including training the model l
 
 ### What's New:
 
-- December 2020: German model (GottBERT) is available: [GottBERT](https://github.com/pytorch/fairseq/tree/master/examples/gottbert).
+- December 2020: German model (GottBERT) is available: [GottBERT](https://github.com/pytorch/fairseq/tree/main/examples/gottbert).
 - January 2020: Italian model (UmBERTo) is available from Musixmatch Research: [UmBERTo](https://github.com/musixmatchresearch/umberto).
-- November 2019: French model (CamemBERT) is available: [CamemBERT](https://github.com/pytorch/fairseq/tree/master/examples/camembert).
-- November 2019: Multilingual encoder (XLM-RoBERTa) is available: [XLM-R](https://github.com/pytorch/fairseq/tree/master/examples/xlmr).
+- November 2019: French model (CamemBERT) is available: [CamemBERT](https://github.com/pytorch/fairseq/tree/main/examples/camembert).
+- November 2019: Multilingual encoder (XLM-RoBERTa) is available: [XLM-R](https://github.com/pytorch/fairseq/tree/main/examples/xlmr).
 - September 2019: TensorFlow and TPU support via the [transformers library](https://github.com/huggingface/transformers).
 - August 2019: RoBERTa is now supported in the [pytorch-transformers library](https://github.com/huggingface/pytorch-transformers).
-- August 2019: Added [tutorial for finetuning on WinoGrande](https://github.com/pytorch/fairseq/tree/master/examples/roberta/wsc#roberta-training-on-winogrande-dataset).
+- August 2019: Added [tutorial for finetuning on WinoGrande](https://github.com/pytorch/fairseq/tree/main/examples/roberta/wsc#roberta-training-on-winogrande-dataset).
 - August 2019: Added [tutorial for pretraining RoBERTa using your own data](README.pretraining.md).
 
 ## Pre-trained models
diff --git a/examples/roberta/commonsense_qa/README.md b/examples/roberta/commonsense_qa/README.md
index 05c6f841a8..7f386decd8 100644
--- a/examples/roberta/commonsense_qa/README.md
+++ b/examples/roberta/commonsense_qa/README.md
@@ -96,4 +96,4 @@ print('Accuracy: ' + str(ncorrect / float(nsamples)))
 ```
 
 The above snippet is not batched, which makes it quite slow. See [instructions
-for batched prediction with RoBERTa](https://github.com/pytorch/fairseq/tree/master/examples/roberta#batched-prediction).
+for batched prediction with RoBERTa](https://github.com/pytorch/fairseq/tree/main/examples/roberta#batched-prediction).
diff --git a/examples/shuffled_word_order/README.md b/examples/shuffled_word_order/README.md
index 14c240cb56..f20483849a 100644
--- a/examples/shuffled_word_order/README.md
+++ b/examples/shuffled_word_order/README.md
@@ -40,7 +40,7 @@ For more results on probing tasks, please refer to [our paper](https://arxiv.org
 
 ## Example Usage
 
-Follow the same usage as in [RoBERTa](https://github.com/pytorch/fairseq/tree/master/examples/roberta) to load and test your models:
+Follow the same usage as in [RoBERTa](https://github.com/pytorch/fairseq/tree/main/examples/roberta) to load and test your models:
 
 ```python
 # Download roberta.base.shuffle.n1 model
@@ -53,11 +53,11 @@ roberta = RoBERTaModel.from_pretrained('/path/to/roberta.base.shuffle.n1', check
 roberta.eval()  # disable dropout (or leave in train mode to finetune)
 ```
 
-**Note**: The model trained without positional embeddings (`roberta.base.nopos`) is a modified `RoBERTa` model, where the positional embeddings are not used. Thus, the typical `from_pretrained` method on fairseq version of RoBERTa will not be able to load the above model weights. To do so, construct a new `RoBERTaModel` object by setting the flag `use_positional_embeddings` to `False` (or [in the latest code](https://github.com/pytorch/fairseq/blob/master/fairseq/models/roberta/model.py#L543), set `no_token_positional_embeddings` to `True`), and then load the individual weights.
+**Note**: The model trained without positional embeddings (`roberta.base.nopos`) is a modified `RoBERTa` model, where the positional embeddings are not used. Thus, the typical `from_pretrained` method on fairseq version of RoBERTa will not be able to load the above model weights. To do so, construct a new `RoBERTaModel` object by setting the flag `use_positional_embeddings` to `False` (or [in the latest code](https://github.com/pytorch/fairseq/blob/main/fairseq/models/roberta/model.py#L543), set `no_token_positional_embeddings` to `True`), and then load the individual weights.
 
 ## Fine-tuning Evaluation
 
-We provide the trained fine-tuned models on MNLI here for each model above for quick evaluation (1 seed for each model). Please refer to [finetuning details](README.finetuning.md) for the parameters of these models. Follow [RoBERTa](https://github.com/pytorch/fairseq/tree/master/examples/roberta) instructions to evaluate these models.
+We provide the trained fine-tuned models on MNLI here for each model above for quick evaluation (1 seed for each model). Please refer to [finetuning details](README.finetuning.md) for the parameters of these models. Follow [RoBERTa](https://github.com/pytorch/fairseq/tree/main/examples/roberta) instructions to evaluate these models.
 
 | Model                                      | MNLI M Dev Accuracy | Link                                                                                                             |
 | :----------------------------------------- | :------------------ | :--------------------------------------------------------------------------------------------------------------- |
diff --git a/examples/speech_synthesis/docs/ljspeech_example.md b/examples/speech_synthesis/docs/ljspeech_example.md
index 2b8d21abf9..90c524fac8 100644
--- a/examples/speech_synthesis/docs/ljspeech_example.md
+++ b/examples/speech_synthesis/docs/ljspeech_example.md
@@ -38,7 +38,7 @@ For your convenience, we provide pre-computed
 [force-alignment](https://dl.fbaipublicfiles.com/fairseq/s2/ljspeech_mfa.zip) from
 [Montreal Forced Aligner](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) and
 [pseudo-text units](s3://dl.fbaipublicfiles.com/fairseq/s2/ljspeech_hubert.tsv) from
-[HuBERT](https://github.com/pytorch/fairseq/tree/master/examples/hubert). You can also generate them by yourself using
+[HuBERT](https://github.com/pytorch/fairseq/tree/main/examples/hubert). You can also generate them by yourself using
 a different software or model.
 
 
@@ -106,7 +106,7 @@ use `--sample-rate 16000` for `get_eval_manifest.py`.
 
 
 #### WER/CER metric
-We use wav2vec 2.0 ASR model as example. [Download](https://github.com/pytorch/fairseq/tree/master/examples/wav2vec)
+We use wav2vec 2.0 ASR model as example. [Download](https://github.com/pytorch/fairseq/tree/main/examples/wav2vec)
 the model checkpoint and dictionary, then compute WER/CER with
 ```bash
 python -m examples.speech_synthesis.evaluation.eval_asr \
diff --git a/examples/textless_nlp/gslm/README.md b/examples/textless_nlp/gslm/README.md
index 79de55d96e..7a76ffd57c 100644
--- a/examples/textless_nlp/gslm/README.md
+++ b/examples/textless_nlp/gslm/README.md
@@ -3,7 +3,7 @@
 * [Paper](https://arxiv.org/abs/2102.01192)
 * [Demo](https://speechbot.github.io/gslm/index.html)
 
-We build and evaluate generative speech2speech systems using [Log Mel Filtebank](https://pytorch.org/audio/stable/compliance.kaldi.html#fbank), [Modified CPC](https://github.com/facebookresearch/CPC_audio), [HuBERT Base](https://github.com/pytorch/fairseq/tree/master/examples/hubert) and [Wav2Vec 2.0 Large](https://github.com/pytorch/fairseq/tree/master/examples/wav2vec). Our system is composed of three components, namely, *speech2unit*, *ulm* and *unit2speech*. We explain about models and usage of these components in their respective sub-directories. See the links below.
+We build and evaluate generative speech2speech systems using [Log Mel Filtebank](https://pytorch.org/audio/stable/compliance.kaldi.html#fbank), [Modified CPC](https://github.com/facebookresearch/CPC_audio), [HuBERT Base](https://github.com/pytorch/fairseq/tree/main/examples/hubert) and [Wav2Vec 2.0 Large](https://github.com/pytorch/fairseq/tree/main/examples/wav2vec). Our system is composed of three components, namely, *speech2unit*, *ulm* and *unit2speech*. We explain about models and usage of these components in their respective sub-directories. See the links below.
 
 ## Speech to Unit Model (speech2unit)
 Speech to unit model is used for quantizing raw speech into learned discrete speech units. [More details](speech2unit)
@@ -18,4 +18,4 @@ Unit to speech model is used for synthesizing speech from discrete speech units.
 We show how to compute ASR based metrics as well as zero-shot metrics proposed in our paper [here](metrics).
 
 ## Tools
-We share two tools to resynthesize a given spoken utterance, and generate novel spoken language given a spoken prompt. [More detail](tools)
\ No newline at end of file
+We share two tools to resynthesize a given spoken utterance, and generate novel spoken language given a spoken prompt. [More detail](tools)
diff --git a/examples/wav2vec/unsupervised/README.md b/examples/wav2vec/unsupervised/README.md
index 046202e01c..0b213fd202 100644
--- a/examples/wav2vec/unsupervised/README.md
+++ b/examples/wav2vec/unsupervised/README.md
@@ -1,6 +1,6 @@
 # wav2vec Unsupervised  (wav2vec-U)
   
-Wav2vec Unsupervised (wav2vec-U) is a framework for building speech recognition systems without any labeled training data as described in [Unsupervised Speech Recognition (Baevski et al., 2021)](https://ai.facebook.com/research/publications/unsupervised-speech-recognition).  The model takes as input wav2vec 2.0 or XLSR representations (see [pretrained models](https://github.com/pytorch/fairseq/blob/master/examples/wav2vec)) as well as unlabeled speech and text data.  
+Wav2vec Unsupervised (wav2vec-U) is a framework for building speech recognition systems without any labeled training data as described in [Unsupervised Speech Recognition (Baevski et al., 2021)](https://ai.facebook.com/research/publications/unsupervised-speech-recognition).  The model takes as input wav2vec 2.0 or XLSR representations (see [pretrained models](https://github.com/pytorch/fairseq/blob/main/examples/wav2vec)) as well as unlabeled speech and text data.
   
   The wav2vec-U training procedure consists of three consecutive main steps:
 * Preparation of speech representations and text data
@@ -8,7 +8,7 @@ Wav2vec Unsupervised (wav2vec-U) is a framework for building speech recognition
 * Iterative self-training + Kaldi LM-decoding
 
 ## Preparation of speech and text data
-Similar to [wav2vec 2.0](https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md),  data folders contain {train,valid,test}.{tsv,wrd,phn} files, where audio paths are stored in tsv files, and word, letter or phoneme transcriptions are stored in .{wrd,ltr,phn}.
+Similar to [wav2vec 2.0](https://github.com/pytorch/fairseq/blob/main/examples/wav2vec/README.md),  data folders contain {train,valid,test}.{tsv,wrd,phn} files, where audio paths are stored in tsv files, and word, letter or phoneme transcriptions are stored in .{wrd,ltr,phn}.
 
 In **/path/to/data/with_silence** you need a *train.tsv* file as well as (optionally) *{valid,test}.{tsv,wrd,phn}*. It is nice to have *10h.{tsv,phn}* files there too for reproducing the ablation study on  layer selection. In **/path/to/data/without_silence** you have the same files, except *.tsv* files contain audios with silences removed using rVAD.
 
diff --git a/fairseq/models/bart/hub_interface.py b/fairseq/models/bart/hub_interface.py
index 9afe385b9d..4d47d97518 100644
--- a/fairseq/models/bart/hub_interface.py
+++ b/fairseq/models/bart/hub_interface.py
@@ -23,7 +23,7 @@
 class BARTHubInterface(GeneratorHubInterface):
     """A simple PyTorch Hub interface to BART.
 
-    Usage: https://github.com/pytorch/fairseq/tree/master/examples/bart
+    Usage: https://github.com/pytorch/fairseq/tree/main/examples/bart
     """
 
     def __init__(self, cfg, task, model):
diff --git a/fairseq/models/roberta/hub_interface.py b/fairseq/models/roberta/hub_interface.py
index c9af434bde..ba298d63ba 100644
--- a/fairseq/models/roberta/hub_interface.py
+++ b/fairseq/models/roberta/hub_interface.py
@@ -14,7 +14,7 @@
 class RobertaHubInterface(nn.Module):
     """A simple PyTorch Hub interface to RoBERTa.
 
-    Usage: https://github.com/pytorch/fairseq/tree/master/examples/roberta
+    Usage: https://github.com/pytorch/fairseq/tree/main/examples/roberta
     """
 
     def __init__(self, cfg, task, model):