diff --git a/README.md b/README.md
index 06983e0..51c50fa 100644
--- a/README.md
+++ b/README.md
@@ -9,24 +9,24 @@ Please join our [Slack workspace](https://genderrewriting.slack.com/) and [Googl
## Task Description:
-The task of gender rewriting refers to generating alternatives of a given sentence to match different target user gender contexts (e.g., female speaker with a male listener, a male speaker with a male listener, etc.). This requires changing the grammatical gender (masculine or feminine) of certain words referring to the users (speaker/1st person and listener/2nd person). In this task, we focus on Arabic, a gender-marking morphologically rich language. The task of gender rewriting was introduced by [Alhafni et al. (2022)](XX).
+The task of gender rewriting refers to generating alternatives of a given sentence to match different target user gender contexts (e.g., female speaker with a male listener, a male speaker with a male listener, etc.). This requires changing the grammatical gender (masculine or feminine) of certain words referring to the users (speaker/1st person and listener/2nd person). In this task, we focus on Arabic, a gender-marking morphologically rich language. The task of gender rewriting was introduced by [Alhafni et al. (2022)](https://arxiv.org/pdf/2205.02211.pdf).
## Data:
-All participating teams will use the publicly available [Arabic Parallel Gender Corpus v2.1](XYZ) to train and test their systems. Participants are not allowed to use external manually labeled datasets, but they can leverage unlabeled data to create synthetic examples (i.e., data augmentation). A blind test set will be used to evaluate the outputs of participating teams. All teams are required to report on the development and test sets in their write ups.
+All participating teams will use the publicly available [Arabic Parallel Gender Corpus v2.1](https://camel.abudhabi.nyu.edu/arabic-parallel-gender-corpus/) to train and test their systems. Participants are not allowed to use external manually labeled datasets, but they can leverage unlabeled data to create synthetic examples (i.e., data augmentation). A blind test set will be used to evaluate the outputs of participating teams. All teams are required to report on the development and test sets in their write ups.
## Evaluation:
-We will treat the task of gender rewriting as a user-aware grammatical error task and use the [M2 Scorer](XX) as the evaluation metric. The [M2 Scorer](XX) computes the Precision, Recall, and F0.5 of the word-level edits between the input and the rewritten output against the gold edits. We provide instructions on how to run the evaluation script below.
+We will treat the task of gender rewriting as a user-aware grammatical error task and use the [M2 Scorer](https://aclanthology.org/N12-1067.pdf) as the evaluation metric. The M2 Scorer computes the Precision, Recall, and F0.5 of the word-level edits between the input and the rewritten output against the gold edits. We provide instructions on how to run the evaluation script below.
### Requirements:
You will need to have [conda](https://docs.conda.io/en/latest/miniconda.html) installed. To setup the environment, you would need to run:
```bash
-git clone https://github.com/balhafni/gender-rewriting-shared-task.git
+git clone https://github.com/CAMeL-Lab/gender-rewriting-shared-task.git
cd gender-rewriting-shared-task
bash scripts/create_envs.sh
@@ -46,7 +46,7 @@ Your system should generate ***four*** output files. Each one of those output fi
3. **Target MF**: Masculine first person and feminine second person.
4. **Target FF**: Feminine first person and masculine second person.
-Once you have the four outputs, place them in a single directory and name them respectively as: **arin.to.MM**, **arin.to.FM**, **arin.to.MF**, and **arin.to.FF**. Since the Arabic Parallel Gender Corpus v2.1 is balanced by design, all of the four files should have the same number of sentences. The [output_example](xx) folder shows how the files should look like when you run your system on the dev set.
+Once you have the four outputs, place them in a single directory and name them respectively as: **arin.to.MM**, **arin.to.FM**, **arin.to.MF**, and **arin.to.FF**. Since the Arabic Parallel Gender Corpus v2.1 is balanced by design, all of the four files should have the same number of sentences. The [output_example](output_example/) folder shows how the files should look like when you run your system on the dev set.
To run the m2scorer on your system's output, you would need to run:
@@ -75,5 +75,7 @@ This repo is available under the MIT license. See the [LICENSE](LICENSE) for mor
## References
+1. [User-Centric Gender Rewriting](https://arxiv.org/pdf/2205.02211.pdf). Alhafni, Bashar, Nizar Habash, Houda Bouamor. 2022. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics, Seattle, Washington.
+2. [The Arabic Parallel Gender Corpus 2.0: Extensions and Analyses](https://arxiv.org/pdf/2110.09216.pdf). Alhafni, Bashar, Nizar Habash, Houda Bouamor. 2022. In Proceedings of the 13th Language Resources and Evaluation Conference (LREC), Marseille, France.