Extracts templated Open Information Extraction: allowing for diverse, non-contiguous, multi-word predicates, while keeping the arguments short and useful for downstream applications.
For example, given the sentence:
Under the agreement with the House and Senate leaders , the minimum wage would rise from the current $ 3.35 an hour to $ 4.25 an hour.
One of the extractions is:
-
Under {A0} {A1} would rise from {A2} to {A3}
A0: the agreement A1: the minimum wage A2: $ 3.35 an hour A3: $ 4.25 an hour
Note that that the head of the predicate (rise) is also identified.
- python 2.7
- pip 9.x
- Install required packages:
pip install -r ./requirements.txt
- Download spaCy English models:
python -m spacy download en
Usage:
prop_extraction --in=INPUT_FILE --out=OUTPUT_FILE [--id]
Extract propositions from a given input file, output is produced in separate output file.
If both in and out paramaters are directories, the script will iterate over all *.txt files in the input directory and
output to *.prop files in output directory.
Options:
--in=INPUT_FILE The input file, each sentence in a separate line
--out=OUTPUT_FILE The output file, Each extraction in a tab separated line, each consisting of original sentence,
predicate template, lemmatized predicate template,argument name, argument value, ...
--id Indicate that the input file is composed of sentence-id \t sentence, and copy this id in the output.
Each extraction is presented in a tab separated line, consisting of:
- Original sentence
Words are separated by a single space, chunks by double spaces. - Index of the main predicate's chunk
- Predicate template
- Lemmatized predicate
- Argument1 name
- Argument1 value
- ...
See the example folder for the output over more than 3K sentences from the news domain.
python ./prop_extraction.py --in=../examples/sentences.txt --out=../examples/sentences.prop
The following projects make use of template-oie: