Rename references from master -> main in preparation for branch name …

…change (facebookresearch#2297) Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: fairinternal/fairseq-py#2297 Reviewed By: alexeib Differential Revision: D30906090 Pulled By: dianaml0 fbshipit-source-id: 941d30db7f766c9077a1b5bb2a04680f57e2e070
varisd · Sep 20, 2021 · 5adfeac · 5adfeac
1 parent f6abcc2
commit 5adfeac
Show file tree

Hide file tree

Showing 23 changed files with 57 additions and 57 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -19,7 +19,7 @@ Steps to reproduce the behavior (**always include the command you ran**):
 
 
 #### Code sample
-<!-- Ideally attach a minimal code sample to reproduce the decried issue. 
+<!-- Ideally attach a minimal code sample to reproduce the decried issue.
 Minimal means having the shortest code but still preserving the bug. -->
 
 ### Expected behavior
@@ -28,7 +28,7 @@ Minimal means having the shortest code but still preserving the bug. -->
 
 ### Environment
 
- - fairseq Version (e.g., 1.0 or master):
+ - fairseq Version (e.g., 1.0 or main):
  - PyTorch Version (e.g., 1.0)
  - OS (e.g., Linux):
  - How you installed fairseq (`pip`, source):

diff --git a/.github/ISSUE_TEMPLATE/how-to-question.md b/.github/ISSUE_TEMPLATE/how-to-question.md
@@ -6,23 +6,23 @@ labels: 'question, needs triage'
 
 ## ❓ Questions and Help
 
-### Before asking:   
-1. search the issues.   
-2. search the docs.    
+### Before asking:
+1. search the issues.
+2. search the docs.
 
 <!-- If you still can't find what you need: -->
 
 #### What is your question?
 
 #### Code
 
-<!-- Please paste a code snippet if your question requires it! -->   
+<!-- Please paste a code snippet if your question requires it! -->
 
 #### What have you tried?
 
 #### What's your environment?
 
- - fairseq Version (e.g., 1.0 or master):
+ - fairseq Version (e.g., 1.0 or main):
  - PyTorch Version (e.g., 1.0)
  - OS (e.g., Linux):
  - How you installed fairseq (`pip`, source):

diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -1,15 +1,15 @@
 # Before submitting
 
 - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
-- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
-- [ ] Did you make sure to update the docs?   
-- [ ] Did you write any new necessary tests?  
+- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
+- [ ] Did you make sure to update the docs?
+- [ ] Did you write any new necessary tests?
 
 ## What does this PR do?
 Fixes # (issue).
 
-## PR review    
-Anyone in the community is free to review the PR once the tests have passed.     
+## PR review
+Anyone in the community is free to review the PR once the tests have passed.
 If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
 
 ## Did you have fun?

diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
@@ -1,10 +1,10 @@
 name: build
 
 on:
-  # Trigger the workflow on push to master or any pull request
+  # Trigger the workflow on push to main or any pull request
   push:
     branches:
-      - master
+      - main
   pull_request:
 
 jobs:

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -5,7 +5,7 @@ possible.
 ## Pull Requests
 We actively welcome your pull requests.
 
-1. Fork the repo and create your branch from `master`.
+1. Fork the repo and create your branch from `main`.
 2. If you've added code that should be tested, add tests.
 3. If you've changed APIs, update the documentation.
 4. Ensure the test suite passes.

diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
   <img src="docs/fairseq_logo.png" width="150">
   <br />
   <br />
-  <a href="https://github.com/pytorch/fairseq/blob/master/LICENSE"><img alt="MIT License" src="https://img.shields.io/badge/license-MIT-blue.svg" /></a>
+  <a href="https://github.com/pytorch/fairseq/blob/main/LICENSE"><img alt="MIT License" src="https://img.shields.io/badge/license-MIT-blue.svg" /></a>
   <a href="https://github.com/pytorch/fairseq/releases"><img alt="Latest Release" src="https://img.shields.io/github/release/pytorch/fairseq.svg" /></a>
   <a href="https://github.com/pytorch/fairseq/actions?query=workflow:build"><img alt="Build Status" src="https://github.com/pytorch/fairseq/workflows/build/badge.svg" /></a>
   <a href="https://fairseq.readthedocs.io/en/latest/?badge=latest"><img alt="Documentation Status" src="https://readthedocs.org/projects/fairseq/badge/?version=latest" /></a>
@@ -48,7 +48,7 @@ We provide reference implementations of various sequence modeling papers:
   + [Linformer: Self-Attention with Linear Complexity (Wang et al., 2020)](examples/linformer/README.md)
   + [Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)](examples/criss/README.md)
   + [Deep Transformers with Latent Depth (Li et al., 2020)](examples/latent_depth/README.md)
-  + [Unsupervised Cross-lingual Representation Learning for Speech Recognition (Conneau et al., 2020)](https://arxiv.org/abs/2006.13979) 
+  + [Unsupervised Cross-lingual Representation Learning for Speech Recognition (Conneau et al., 2020)](https://arxiv.org/abs/2006.13979)
   + [Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training (Hsu, et al., 2021)](https://arxiv.org/abs/2104.01027)
   + [Unsupervised Speech Recognition (Baevski, et al., 2021)](https://arxiv.org/abs/2105.11084)
 * **Non-autoregressive Transformers**
@@ -93,7 +93,7 @@ We provide reference implementations of various sequence modeling papers:
 * April 2020: [Initial model parallel support and 11B parameters unidirectional LM released](examples/megatron_11b/README.md)
 * March 2020: [Byte-level BPE code released](examples/byte_level_bpe/README.md)
 * February 2020: [mBART model and code released](examples/mbart/README.md)
-* February 2020: [Added tutorial for back-translation](https://github.com/pytorch/fairseq/tree/master/examples/backtranslation#training-your-own-model-wmt18-english-german)
+* February 2020: [Added tutorial for back-translation](https://github.com/pytorch/fairseq/tree/main/examples/backtranslation#training-your-own-model-wmt18-english-german)
 * December 2019: [fairseq 0.9.0 released](https://github.com/pytorch/fairseq/releases/tag/v0.9.0)
 * November 2019: [VizSeq released (a visual analysis toolkit for evaluating fairseq models)](https://facebookresearch.github.io/vizseq/docs/getting_started/fairseq_example)
 * November 2019: [CamemBERT model and code released](examples/camembert/README.md)

diff --git a/docs/conf.py b/docs/conf.py
@@ -55,7 +55,7 @@
 copyright = "Facebook AI Research (FAIR)"
 author = "Facebook AI Research (FAIR)"
 
-github_doc_root = "https://github.com/pytorch/fairseq/tree/master/docs/"
+github_doc_root = "https://github.com/pytorch/fairseq/tree/main/docs/"
 
 # The version info for the project you're documenting, acts as replacement for
 # |version| and |release|, also used in various other places throughout the

diff --git a/examples/adaptive_span/README.md b/examples/adaptive_span/README.md
@@ -4,7 +4,7 @@ Adaptive Span is a novel self-attention mechanism that can learn its optimal
 attention span. This allows us to extend significantly the maximum context size
 used in Transformer, while maintaining control over their memory footprint
 and computational time. It uses the Truncated BPTT technique for training,
-as in [transformerXL](https://github.com/pytorch/fairseq/blob/master/examples/truncated_bptt/README.md).
+as in [transformerXL](https://github.com/pytorch/fairseq/blob/main/examples/truncated_bptt/README.md).
 
 Adaptive Span was introduced by paper:
 [Adaptive Attention Span in Transformers](https://arxiv.org/abs/1905.07799),

diff --git a/examples/constrained_decoding/README.md b/examples/constrained_decoding/README.md
@@ -12,7 +12,7 @@ Constrained search is enabled by adding the command-line argument `--constraints
 Constraints are appended to each line of input, separated by tabs. Each constraint (one or more tokens)
 is a separate field.
 
-The following command, using [Fairseq's WMT19 German--English model](https://github.com/pytorch/fairseq/blob/master/examples/wmt19/README.md),
+The following command, using [Fairseq's WMT19 German--English model](https://github.com/pytorch/fairseq/blob/main/examples/wmt19/README.md),
 translates the sentence *Die maschinelle Übersetzung ist schwer zu kontrollieren.* with the constraints
 "hard" and "to influence".
 

diff --git a/examples/discriminative_reranking_nmt/README.md b/examples/discriminative_reranking_nmt/README.md
@@ -38,7 +38,7 @@ source_sentence_L_hypo_1
 source_sentence_L_hypo_N
 ```
 
-2. Download the [XLMR model](https://github.com/fairinternal/fairseq-py/tree/master/examples/xlmr#pre-trained-models).
+2. Download the [XLMR model](https://github.com/fairinternal/fairseq-py/tree/main/examples/xlmr#pre-trained-models).
 ```
 wget https://dl.fbaipublicfiles.com/fairseq/models/xlmr.base.tar.gz
 tar zxvf xlmr.base.tar.gz

diff --git a/examples/fast_noisy_channel/README.md b/examples/fast_noisy_channel/README.md
@@ -29,9 +29,9 @@ This framework provides a great way to utlize strong target language models trai
 
 ### Training Translation Models and Language Models
 
-For training Transformer models in fairseq for machine translation, refer to instructions [here](https://github.com/pytorch/fairseq/tree/master/examples/translation)
+For training Transformer models in fairseq for machine translation, refer to instructions [here](https://github.com/pytorch/fairseq/tree/main/examples/translation)
 
-For training Transformer models in fairseq for language modeling, refer to instructions [here](https://github.com/pytorch/fairseq/tree/master/examples/language_model)
+For training Transformer models in fairseq for language modeling, refer to instructions [here](https://github.com/pytorch/fairseq/tree/main/examples/language_model)
 
 ### Generation with Language Model for German-English translation with fairseq
 

diff --git a/examples/layerdrop/README.md b/examples/layerdrop/README.md
@@ -126,9 +126,9 @@ This model override command overrides the training parameters and updates the mo
 
 Looking to reproduce the results in the paper?
 
-1. For Translation on WMT16 en-de, we followed this setting [here](https://github.com/pytorch/fairseq/blob/master/examples/scaling_nmt/README.md)
-2. To train RoBERTa, we followed this setting [here](https://github.com/pytorch/fairseq/tree/master/examples/roberta)
-3. To train Language Models on Wikitext-103, we followed this setting [here](https://github.com/pytorch/fairseq/tree/master/examples/language_model)
+1. For Translation on WMT16 en-de, we followed this setting [here](https://github.com/pytorch/fairseq/blob/main/examples/scaling_nmt/README.md)
+2. To train RoBERTa, we followed this setting [here](https://github.com/pytorch/fairseq/tree/main/examples/roberta)
+3. To train Language Models on Wikitext-103, we followed this setting [here](https://github.com/pytorch/fairseq/tree/main/examples/language_model)
 
 
 ## Tips

diff --git a/examples/m2m_100/README.md b/examples/m2m_100/README.md
@@ -82,7 +82,7 @@ fairseq-preprocess \
 
 3. **Training Scripts**
 
-To reproduce the training of our models, we train with fairseq-py's multilingual translation [task](https://github.com/pytorch/fairseq/tree/master/examples/multilingual). If you are interested in model parallel training, also check out [fairscale](https://github.com/facebookresearch/fairscale).
+To reproduce the training of our models, we train with fairseq-py's multilingual translation [task](https://github.com/pytorch/fairseq/tree/main/examples/multilingual). If you are interested in model parallel training, also check out [fairscale](https://github.com/facebookresearch/fairscale).
 
 4. **Generation**
 

diff --git a/examples/multilingual/README.md b/examples/multilingual/README.md
@@ -17,9 +17,9 @@ This work is for training multilingual translation models with multiple bitext d
   - --finetune-from-model to specify the path from which to load the pretrained model
 
 ## Preprocessing data
-Multilingual training requires a joint BPE vocab. Please follow [mBART's preprocessing steps](https://github.com/pytorch/fairseq/tree/master/examples/mbart#bpe-data) to reuse our pretrained sentence-piece model.
+Multilingual training requires a joint BPE vocab. Please follow [mBART's preprocessing steps](https://github.com/pytorch/fairseq/tree/main/examples/mbart#bpe-data) to reuse our pretrained sentence-piece model.
 
-You can also train a joint BPE model on your own dataset and then follow the steps in [[link]](https://github.com/pytorch/fairseq/tree/master/examples/translation#multilingual-translation).
+You can also train a joint BPE model on your own dataset and then follow the steps in [[link]](https://github.com/pytorch/fairseq/tree/main/examples/translation#multilingual-translation).
 
 ## Training
 
@@ -49,7 +49,7 @@ fairseq-train $path_2_data \
 ```
 
 ## Finetuning
-We can also finetune multilingual models from a monolingual pretrained models, e.g. [mMBART](https://github.com/pytorch/fairseq/tree/master/examples/mbart).
+We can also finetune multilingual models from a monolingual pretrained models, e.g. [mMBART](https://github.com/pytorch/fairseq/tree/main/examples/mbart).
 ```bash
 lang_pairs=<language pairs to be trained, e.g. "en-cs,cs-en">
 path_2_data=<set to data path>

diff --git a/examples/quant_noise/README.md b/examples/quant_noise/README.md
@@ -33,7 +33,7 @@ Unlike the section [Iterative Product Quantization](#iterative-product-quantizat
 
 #### Training
 
-Scalar quantization with Quant-Noise consists in randomly quantizing a proportion `p` of the weights during training. Scalar quantization is implemented [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quantization/scalar) under the form of Fake Quantization, meaning that we emulate int8 on GPU by quantizing and de-quantizing both the weights and the activations. We rely on PyTorch's [quantization primitives](https://github.com/pytorch/pytorch/tree/master/torch/quantization).
+Scalar quantization with Quant-Noise consists in randomly quantizing a proportion `p` of the weights during training. Scalar quantization is implemented [here](https://github.com/pytorch/fairseq/tree/main/fairseq/modules/quantization/scalar) under the form of Fake Quantization, meaning that we emulate int8 on GPU by quantizing and de-quantizing both the weights and the activations. We rely on PyTorch's [quantization primitives](https://github.com/pytorch/pytorch/tree/master/torch/quantization).
 
 To train a model with Quant-Noise, add the following flag:
 ```
@@ -49,7 +49,7 @@ When evaluating a network, all quantized modules and activation hooks automatica
 #### Integration with your own code
 
 Looking to quantize your own models with Quant-Noise + Scalar Quantization?
-- Use the function `quantize_model_` implemented [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quantization/scalar/utils.py) to (1) replace all your modules by their quantized counterparts and (2) add hooks to those modules to quantize the activations.
+- Use the function `quantize_model_` implemented [here](https://github.com/pytorch/fairseq/tree/main/fairseq/modules/quantization/scalar/utils.py) to (1) replace all your modules by their quantized counterparts and (2) add hooks to those modules to quantize the activations.
 - Then, perform your training as usual. Note that in `eval()` mode, the network is always fully quantized (weights and activations) by default (`p=1`).
 
 
@@ -66,12 +66,12 @@ To train a model with Quant-Noise, add the following flags:
 --quant-noise-pq 0.1 --quant-noise-pq-block-size 8
 ```
 `quant-noise-pq` controls how much dropout is applied to the blocks of the weight matrix. `quant-noise-pq-block-size` controls the size of the weight matrix blocks.
-We recommend training with 0.05 to 0.2 Quant-Noise, a value that worked well in our experiments. For the block-size, we recommend training with block-size of 8. Note that the block size must be a multiple of `input_features`, see the size checks [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quant_noise.py). Large block sizes result in higher compression ratio but may induce a loss in accuracy.
+We recommend training with 0.05 to 0.2 Quant-Noise, a value that worked well in our experiments. For the block-size, we recommend training with block-size of 8. Note that the block size must be a multiple of `input_features`, see the size checks [here](https://github.com/pytorch/fairseq/tree/main/fairseq/modules/quant_noise.py). Large block sizes result in higher compression ratio but may induce a loss in accuracy.
 
-We currently support training Transformer based models, such as sequence-to-sequence, language models, and BERT architectures. The `quant_noise` function [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quant_noise.py) wraps a module. It splits a weight matrix into blocks and applies random dropout to these blocks.
+We currently support training Transformer based models, such as sequence-to-sequence, language models, and BERT architectures. The `quant_noise` function [here](https://github.com/pytorch/fairseq/tree/main/fairseq/modules/quant_noise.py) wraps a module. It splits a weight matrix into blocks and applies random dropout to these blocks.
 In the Transformer architectures, quant-noise is applied to the input and output embeddings, the attention, and the FFN.
 
-Quant-Noise can also be combined with **LayerDrop** (see [here](https://github.com/pytorch/fairseq/tree/master/examples/layerdrop)) to add its pruning effect to the quantized model and make the model even smaller. We recommend training with LayerDrop 0.1 or 0.2.
+Quant-Noise can also be combined with **LayerDrop** (see [here](https://github.com/pytorch/fairseq/tree/main/examples/layerdrop)) to add its pruning effect to the quantized model and make the model even smaller. We recommend training with LayerDrop 0.1 or 0.2.
 
 #### Quantization
 
@@ -84,8 +84,8 @@ For the particular case of PQ, quantization is made sequentially. We recommend f
 #### Integration with your own code
 
 Looking to quantize your own models with Quant-Noise + iPQ?
-- First wrap your modules with the `quant_noise` function [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quant_noise.py), which is module-agnostic and train your favorite model.
-- Then, quantize your trained model using the code [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quantization/pq). This can be done *without any changes to your training loop*. Below is an example code for integration.
+- First wrap your modules with the `quant_noise` function [here](https://github.com/pytorch/fairseq/tree/main/fairseq/modules/quant_noise.py), which is module-agnostic and train your favorite model.
+- Then, quantize your trained model using the code [here](https://github.com/pytorch/fairseq/tree/main/fairseq/modules/quantization/pq). This can be done *without any changes to your training loop*. Below is an example code for integration.
 Note that we tried our approach only on Transformers and various Convolutional Models such as EfficientNets.
 
 ```python
@@ -128,7 +128,7 @@ We detail below how to reproduce the state-of-the-art results in reported in the
 
 ### Training with Quant-Noise
 
-To **train** RoBERTa + QuantNoise, we followed this setting [here](https://github.com/pytorch/fairseq/tree/master/examples/roberta).
+To **train** RoBERTa + QuantNoise, we followed this setting [here](https://github.com/pytorch/fairseq/tree/main/examples/roberta).
 The following command can be used to train a RoBERTa Base + QuantNoise model:
 
 ```bash
@@ -158,7 +158,7 @@ fairseq-train $DATA_DIR \
     --quant-noise-pq 0.2 --quant-noise-pq-block-size 8 --untie-weights-roberta
 ```
 
-To **finetune** RoBERTa + QuantNoise, we followed this setting [here](https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.glue.md).
+To **finetune** RoBERTa + QuantNoise, we followed this setting [here](https://github.com/pytorch/fairseq/blob/main/examples/roberta/README.glue.md).
 The following command can be used to finetune a RoBERTa Base + QuantNoise model on the RTE dataset:
 
 ```bash
@@ -193,7 +193,7 @@ fairseq-train /path/to/rte/data/ \
     --quant-noise-pq 0.2 --quant-noise-pq-block-size 8
 ```
 
-To **train** Language Models on Wikitext-103, we followed this setting [here](https://github.com/pytorch/fairseq/tree/master/examples/language_model).
+To **train** Language Models on Wikitext-103, we followed this setting [here](https://github.com/pytorch/fairseq/tree/main/examples/language_model).
 The following command can be used to train a Transformer + QuantNoise model on Wikitext-103:
 
 ```bash