From a9e78659228c4ea68788bbb5ae07c040da8a4ba3 Mon Sep 17 00:00:00 2001
From: "Michael A. Hedderich" <michael-aloys@users.noreply.github.com>
Date: Tue, 27 Jul 2021 13:17:31 +0200
Subject: [PATCH 1/2] Extended Data Augmentation -> Text

- Added a structure into token level, sentence part level and sentence level augmentation
- Added some more references
- Replaced arxiv with ACL-Anthology links
---
 augmentation.md | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/augmentation.md b/augmentation.md
index 12345e8..f1fd543 100644
--- a/augmentation.md
+++ b/augmentation.md
@@ -57,14 +57,16 @@ While these primitives have culminated in compelling performance gains, they can
 
 Heuristic transformations for text typically involve paraphrasing text in order to produce more diverse samples.
 
-- [Backtranslation](https://arxiv.org/abs/1511.06709) uses a round-trip translation from a source to target language and back in order to generate a paraphrase.
-  Examples of use include [QANet](https://arxiv.org/abs/1804.09541).
-- Synonym substitution methods replace words with their synonyms such as in
-  [Data Augmentation for Low-Resource Neural Machine Translation](https://www.aclweb.org/anthology/P17-2090/),
-  [Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations](https://www.aclweb.org/anthology/N18-2072/),
+- On a token level, synonym substitution methods replace words with their synonyms. Synonyms might be chosen based on
+   - a knowledge base such as a thesaurus: e.g. [Character-level Convolutional Networks for Text Classification](https://arxiv.org/pdf/1509.01626.pdf) and [An Analysis of Simple Data Augmentation for Named Entity Recognition](https://aclanthology.org/2020.coling-main.343/)
+   - neighbors in a word embedding space: e.g. [That’s So Annoying!!!](https://www.aclweb.org/anthology/D15-1306/) 
+   - probable words according to a language model that takes the sentence context into account: e.g. 
   [Model-Portability Experiments for Textual Temporal Analysis](https://www.aclweb.org/anthology/P11-2047/),
-  [That’s So Annoying!!!](https://www.aclweb.org/anthology/D15-1306/) and
-  [Character-level Convolutional Networks for Text Classification](https://arxiv.org/pdf/1509.01626.pdf)
+  [Data Augmentation for Low-Resource Neural Machine Translation](https://www.aclweb.org/anthology/P17-2090/) and
+  [Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations](https://www.aclweb.org/anthology/N18-2072/)
+- Sentence parts can be reordered by manipulating the syntax tree of a sentence: e.g. [Data augmentation via dependency tree morphing for low-resource languages](https://aclanthology.org/D18-1545/)
+- The whole sentence can be modified via [Backtranslation](https://aclanthology.org/P16-1009/). There a round-trip translation from a source to target language and back is used to generate a paraphrase. Examples of use include [QANet](https://arxiv.org/abs/1804.09541) and [Unsupervised Data Augmentation for Consistency Training](https://proceedings.neurips.cc/paper/2020/hash/44feb0096faa8326192570788b38c1d1-Abstract.html).
+
 
 [comment]: <> (- Noising)
 [comment]: <> (- Grammar induction)

From 086cd2bc27e5dc7e22b82c3758571c29f69d1ac6 Mon Sep 17 00:00:00 2001
From: Karan Goel <kgoel93@gmail.com>
Date: Tue, 27 Jul 2021 17:37:32 -0400
Subject: [PATCH 2/2] Update THANKS.md

---
 THANKS.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/THANKS.md b/THANKS.md
index ae2c04d..4aa067d 100644
--- a/THANKS.md
+++ b/THANKS.md
@@ -14,5 +14,6 @@ The following individuals and organizations have contributed to the development
 - [Ce Zhang](https://scholar.google.ch/citations?user=GkXqbmMAAAAJ&hl=en) and [Cedric Renggli](https://people.inf.ethz.ch/rengglic/) from ETH-Zurich added discussion for data cleaning and MLOps
 - [Eugene Wu](http://www.cs.columbia.edu/~ewu/) from Columbia added discussion for data cleaning
 - [Cody Coleman](http://www.codycoleman.com) from Stanford added discussion for data selection
+- [Michael Hedderich](https://michael-hedderich.de) from Saarland Informatics added discussion for data augmentation
 
-Thanks to everyone who has provided feedback on this resource, including Dan Hendrycks and Jacob Steinhardt at UC-Berkeley, James Zou, Matei Zaharia, Daniel Kang, Chelsea Finn from Stanford, Mike Cafarella from MIT, Ameet Talkwalkar from CMU.
\ No newline at end of file
+Thanks to everyone who has provided feedback on this resource, including Dan Hendrycks and Jacob Steinhardt at UC-Berkeley, James Zou, Matei Zaharia, Daniel Kang, Chelsea Finn from Stanford, Mike Cafarella from MIT, Ameet Talkwalkar from CMU.