[!144][RELEASE] Release of the INES evaluation (WMT2023)

# Which work do we release? "Test Suites Task: Evaluation of Gender Fairness in MT with MuST-SHE and INES" in WMT2023. # What changes does this release refer to? aebf19f029acc2516e67b6d4fd71e9673ee1ae33 3ca5b2666bc82d8902eb823435ffd1a39ede82e1
hlt-mt · Oct 18, 2023 · b2f67b5 · b2f67b5
1 parent 74dc0fb
commit b2f67b5
Show file tree

Hide file tree

Showing 2 changed files with 52 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -4,6 +4,8 @@ This repository contains the open source code by the MT unit of FBK.
 Dedicated README for each work can be found in the `fbk_works` directory.
 
  ### 2023
+
+ - [[WMT 2023] **Test Suites Task: Evaluation of Gender Fairness in MT with MuST-SHE and INES**](fbk_works/INES_eval.md)  
  - [[ASRU 2023] **No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition Through Pitch Manipulation**](fbk_works/PITCH_MANIPULATION_ASR.md)
  - [[TACL 2023] **Direct Speech Translation for Automatic Subtitling**](fbk_works/DIRECT_SUBTITLING.md)
  - [[INTERSPEECH 2023] **AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation**](fbk_works/ALIGNATT_SIMULST_AGENT_INTERSPEECH2023.md)

diff --git a/fbk_works/INES_eval.md b/fbk_works/INES_eval.md
@@ -0,0 +1,50 @@
+# INES Test Suite Evaluation (WMT 2023)
+Code to evaluate MT systems on the INclusive Evaluation Suite (INES).
+
+## INES Evaluation
+
+We release the code the FBK participation to the WMT Test Suite shared subtask: [**INES_eval.py**](../examples/speech_to_text/scripts/gender/INES_eval.py).
+It allows to assess the ability of MT systems to generate inclusive language forms over non-inclusive ones when translating from German into English on the [**INES test set**](https://mt.fbk.eu/ines/).
+
+
+For systems run on the INES test suite, the evaluation script "INES_eval.py" computes:
+
+* **inclusivity_index** scores (INES official metric)
+
+* **terms coverage** and **gender accuracy** scores (additional metrics)
+
+
+### Usage
+
+To work correctly, the script requires Python 3.
+
+The script requires two mandatory arguments:
+
+	--input FILE 
+    --tsv-definition FILE
+
+Namely, the output of the system you want to evaluate and the [**INES.tsv**](https://mt.fbk.eu/ines/) file (the Gold Standard). Note that the output must be tokenized (e.g. with Moses' tokenizer.perl)
+
+You can run "INES_eval.py --help" to get a list of the parameters taken by the script.
+The script computes terms coverage and gender accuracy if requested as facultative argument.
+
+Example Usage
+
+    python3 INES_eval.py --input MT OUTPUT FILE --tsv-definition INES.tsv
+
+
+## 📍Citation
+
+If you use this code and for more information, please refer to:
+
+```bibtex
+@inproceedings{savoldi-etal-2023-test,
+    title = {{Test Suite Task: Evaluation of Gender Fairness in MT with MuST-SHE and INES}},
+    author = {Savoldi, Beatrice  and Gaido, Marco  and Negri, Matteo and Bentivogli, Luisa},
+    booktitle = {Proceedings of the 8th International Conference on Machine Translation (WMT 2023)},
+    month = dec,
+    year = "2023",
+    address = "Singapore",
+    publisher = "Association for Computational Linguistics",
+}
+```