Skip to content

This software was developed for the CLEF 2022 Text Simplification task.

Notifications You must be signed in to change notification settings

Hisarlik/simpleTextCLEF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

simpleTextCLEF

This software was developed for the CLEF 2022 Text Simplification task.

Our work uses the transfer learning capabilities of the T5 pre-trained language model, adding a method to control specific simplification features. We present a new feature based on masked tokens prediction (Language Model Fill-Mask) to control the lexical complexity of the text generation process. The results obtained with the SARI metric are at the same level as previous work in other domains for sentence simplification.

Steps to replicate the results:

  1. Clone this repository
  2. Install dependencies:
pip install -r requirements.txt
  1. For training purpose:

Select hyperparameters in T5_train.py

python scripts/T5_train.py
  1. Optimization:

Select experiment_id, dataset and trials in optimization.py

python scripts/optimization.py
  1. For test purpose:

Select experiment_id and dataset in T5_evaluate.py

python scripts/T5_evaluate.py

Same for larger version. Be carefull with memory issues.

Data

Download the dataset from https://simpletext-project.com/2022/clef/en/tasks. It's necessary to preprocessed the raw data using 1.Preprocessing dataset Task 3.ipynb.

About

This software was developed for the CLEF 2022 Text Simplification task.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published