This project runs experiments comparing the benefit of soft labeling and filtering with label aggregation for learning a classification model n natural language tasks. This project is the experiment code described in the paper, "Noise or additional information? Leveraging crowdsource annotation item agreement for natural language tasks" (Jamison and Gurevych, 2015).
Please use the following citation:
@inproceedings{ TUD-CS-2015179,
author = {Emily Jamison and Iryna Gurevych},
title = {Noise or additional information? Leveraging crowdsource annotation item
agreement for natural language tasks},
month = sep,
year = {2015},
publisher = {Association for Computational Linguistics},
booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language
Processing (EMNLP)},
pages = {291--297},
language = {Lisbon, Portugal},
pubkey = {TUD-CS-2015-1179},
research_area = {Ubiquitous Knowledge Processing},
research_sub_area = {UKP_reviewed},
url = {https://aclweb.org/anthology/D/D15/D15-1035.pdf}
}
Abstract: In order to reduce noise in training data, most natural language crowdsourcing annotation tasks gather redundant labels and aggregate them into an integrated label, which is provided to the classifier. However, aggregation discards potentially useful information from linguistically ambiguous instances. For five natural language tasks, we pass item agreement on to the task classifier via soft labeling and low-agreement filtering of the training dataset. We find a statistically significant benefit from low item agreement training filtering in four of our five tasks, and no systematic benefit from soft labeling.
Contact person: Emily Jamison, EmilyKJamison {at} gmail {dot} com
http://www.ukp.tu-darmstadt.de/
Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.
This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
src/main/groovy/de/tudarmstadt/ukp/experiments/ej/repeatwithcrowdsource
-- this folder contains java experiment code for 5 natural language tasksresources/scripts
-- this folder contains the Groovy files where experiment parameters may be set- Please note: 3rd party datasets must be downloaded from elsewhere
- Java 1.7 and higher
- Maven
- Tested on 64-bit Linux with 2 GB RAM (
-Xmx2g
) - 2 GB RAM
-
Follow the DKPro Core instructions to set your DKPro Home environment variable.
-
All dependencies are available on Maven Central; no 3rd party projects must be installed.
-
You will need to obtain the necessary corpora for the respective experiment you plan to run. Corpora and locations are described in (Jamison & Gurevych 2015), cited above.
-
For all experiments except Affective Text, prepare your corpus for our experiment architecture by dividing instances into cross-validation rounds of training and test data. We created "dev" and "final" batches of "train" and "test" datasets, resulting in (for RTE):
rte_orig.r0.devtest.txt
rte_orig.r0.devtrain.txt
rte_orig.r0.finaltest.txt
rte_orig.r0.finaltrain.txt
rte_orig.r1.devtest.txt
rte_orig.r1.devtrain.txt
rte_orig.r1.finaltest.txt
rte_orig.r1.finaltrain.txt
etc.
For each experiment, update file locations in the Groovy file in src/main/resources/scripts
(such as method runManualCVExperiment()
).
To run an experiment, first set the experiment parameters in the respective Groovy file in src/main/resources/scripts
; in particular, you may wish to change the path to your corpus or parameter instanceModeTrain, the feature set, or feature parameters.
Then, run the respective "RunXXXExperiment" in src/main/groovy/EXPERIMENTTORUN/
. For example, to run the Biased language experiment, run the class src/main/groovy/biasedlanguage/RunBiasedLangExperiment
.
Affective Text experiments run in a few seconds, while POS Tagging experiments may take several hours.
After running the experiments, results should be printed to stdout. They can also be found in your dkpro home folder, under de.tudarmstadt.ukp.dkpro.lab/repository. You can change which results get printed from src/main/groovy/util/CombineTestResultsRegression or CombineTestResultsClassification, as appropriate. The tasks Biased Language, Affective Text use regression, while Stemming, RTE, and POS Tagging use classification.