diff --git a/README.md b/README.md index 06983e0..51c50fa 100644 --- a/README.md +++ b/README.md @@ -9,24 +9,24 @@ Please join our [Slack workspace](https://genderrewriting.slack.com/) and [Googl ## Task Description: -The task of gender rewriting refers to generating alternatives of a given sentence to match different target user gender contexts (e.g., female speaker with a male listener, a male speaker with a male listener, etc.). This requires changing the grammatical gender (masculine or feminine) of certain words referring to the users (speaker/1st person and listener/2nd person). In this task, we focus on Arabic, a gender-marking morphologically rich language. The task of gender rewriting was introduced by [Alhafni et al. (2022)](XX). +The task of gender rewriting refers to generating alternatives of a given sentence to match different target user gender contexts (e.g., female speaker with a male listener, a male speaker with a male listener, etc.). This requires changing the grammatical gender (masculine or feminine) of certain words referring to the users (speaker/1st person and listener/2nd person). In this task, we focus on Arabic, a gender-marking morphologically rich language. The task of gender rewriting was introduced by [Alhafni et al. (2022)](https://arxiv.org/pdf/2205.02211.pdf). ## Data: -All participating teams will use the publicly available [Arabic Parallel Gender Corpus v2.1](XYZ) to train and test their systems. Participants are not allowed to use external manually labeled datasets, but they can leverage unlabeled data to create synthetic examples (i.e., data augmentation). A blind test set will be used to evaluate the outputs of participating teams. All teams are required to report on the development and test sets in their write ups. +All participating teams will use the publicly available [Arabic Parallel Gender Corpus v2.1](https://camel.abudhabi.nyu.edu/arabic-parallel-gender-corpus/) to train and test their systems. Participants are not allowed to use external manually labeled datasets, but they can leverage unlabeled data to create synthetic examples (i.e., data augmentation). A blind test set will be used to evaluate the outputs of participating teams. All teams are required to report on the development and test sets in their write ups. ## Evaluation: -We will treat the task of gender rewriting as a user-aware grammatical error task and use the [M2 Scorer](XX) as the evaluation metric. The [M2 Scorer](XX) computes the Precision, Recall, and F0.5 of the word-level edits between the input and the rewritten output against the gold edits. We provide instructions on how to run the evaluation script below.
+We will treat the task of gender rewriting as a user-aware grammatical error task and use the [M2 Scorer](https://aclanthology.org/N12-1067.pdf) as the evaluation metric. The M2 Scorer computes the Precision, Recall, and F0.5 of the word-level edits between the input and the rewritten output against the gold edits. We provide instructions on how to run the evaluation script below.
### Requirements: You will need to have [conda](https://docs.conda.io/en/latest/miniconda.html) installed. To setup the environment, you would need to run: ```bash -git clone https://github.com/balhafni/gender-rewriting-shared-task.git +git clone https://github.com/CAMeL-Lab/gender-rewriting-shared-task.git cd gender-rewriting-shared-task bash scripts/create_envs.sh @@ -46,7 +46,7 @@ Your system should generate ***four*** output files. Each one of those output fi 3. **Target MF**: Masculine first person and feminine second person. 4. **Target FF**: Feminine first person and masculine second person. -Once you have the four outputs, place them in a single directory and name them respectively as: **arin.to.MM**, **arin.to.FM**, **arin.to.MF**, and **arin.to.FF**. Since the Arabic Parallel Gender Corpus v2.1 is balanced by design, all of the four files should have the same number of sentences. The [output_example](xx) folder shows how the files should look like when you run your system on the dev set.

+Once you have the four outputs, place them in a single directory and name them respectively as: **arin.to.MM**, **arin.to.FM**, **arin.to.MF**, and **arin.to.FF**. Since the Arabic Parallel Gender Corpus v2.1 is balanced by design, all of the four files should have the same number of sentences. The [output_example](output_example/) folder shows how the files should look like when you run your system on the dev set.

To run the m2scorer on your system's output, you would need to run: @@ -75,5 +75,7 @@ This repo is available under the MIT license. See the [LICENSE](LICENSE) for mor ## References +1. [User-Centric Gender Rewriting](https://arxiv.org/pdf/2205.02211.pdf). Alhafni, Bashar, Nizar Habash, Houda Bouamor. 2022. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics, Seattle, Washington. +2. [The Arabic Parallel Gender Corpus 2.0: Extensions and Analyses](https://arxiv.org/pdf/2110.09216.pdf). Alhafni, Bashar, Nizar Habash, Houda Bouamor. 2022. In Proceedings of the 13th Language Resources and Evaluation Conference (LREC), Marseille, France.