Skip to content

Commit

Permalink
Move data to GDrive
Browse files Browse the repository at this point in the history
  • Loading branch information
jinyongyoo committed Sep 12, 2021
1 parent 4888af0 commit dc898cd
Show file tree
Hide file tree
Showing 27 changed files with 2 additions and 1,034,808 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@ pip install -r requirements.txt
## Data
All of the data used for the paper are available from HuggingFace's [Datasets](https://huggingface.co/datasets).

For IMDB and Yelp datasets, because there are no official validation splits, we randomly sampled 5k and 10k, respectively, from the training set and used them as valid splits. The splits are available in `data`.
For IMDB and Yelp datasets, because there are no official validation splits, we randomly sampled 5k and 10k, respectively, from the training set and used them as valid splits. We provide the splits in this Google Drive [folder](https://drive.google.com/drive/folders/1-vvSXUzl1PzMzdyZzAWq2dB--m7tEERK?usp=sharing).

Also, augmented training data generated using SSMBA and back-translation are available under `data/augmented_data`.
Also, augmented training data generated using SSMBA and back-translation are available in the same folder.

## Training
To train BERT model on IMDB dataset with A2T attack for 4 epochs and 1 clean epoch with gamma of 0.2:
Expand Down
20,001 changes: 0 additions & 20,001 deletions data/augmented_data/imdb_backtranslation/train-1.tsv

This file was deleted.

20,001 changes: 0 additions & 20,001 deletions data/augmented_data/imdb_backtranslation/train-2.tsv

This file was deleted.

20,001 changes: 0 additions & 20,001 deletions data/augmented_data/imdb_backtranslation/train-3.tsv

This file was deleted.

20,001 changes: 0 additions & 20,001 deletions data/augmented_data/imdb_ssmba/train-1.tsv

This file was deleted.

Loading

0 comments on commit dc898cd

Please sign in to comment.