Skip to content

Latest commit

 

History

History
184 lines (130 loc) · 6.31 KB

README_EN.md

File metadata and controls

184 lines (130 loc) · 6.31 KB

中文说明 | English

OpenUE is a lightweight toolkit for knowledge graph extraction.

GitHub Documentation

OpenUE is a lightweight knowledge graph extraction tool.

Features

  • Knowledge extraction task based on pre-training language model (compatible with pre-training models such as BERT and Roberta.)
    • Named Entity Extraction
    • Event Extraction
    • Slot filling and intent detection
    • more tasks 
  • Training and testing interface
  • fast deployment of your extraction models

Environment

  • python3.8
  • requirements.txt

Architecture

框架

It mainly includes three modules, as models,lit_models and data.

models module

It stores our three main models, the relationship recognition model for the single sentence, the named entity recognition model for the relationship in the known sentence, and the inference model that integrates the first two. It is mainly derived from the defined pre-trained models in the transformers library.

lit_models module

The code is mainly inherited from pytorch_lightning.Trainer. It can automatically build model training under different hardware such as single card, multi-card, GPU, TPU, etc. We define training_step and validation_step in it to automatically build training logic for training.

Because its hardware is not sensitive, we can call the OpenUE training module in a variety of different environments.

data module

The code for different operations on different data sets is stored in data. The tokenizer in the transformers library is used to segment the data and then turn the data into the features we need according to different datasets.

Quick start

Install

Anaconda

conda create -n openue python=3.8
conda activate openue
pip install -r requirements.txt
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia # depend on your GPU driver version
python setup.py install

pip

pip install openue

pip dev

python setup.py develop

How to use

The data format is a json file, the specific example is as follows. (in the ske dataset)

{
	"text": "查尔斯·阿兰基斯(Charles Aránguiz),1989年4月17日出生于智利圣地亚哥,智利职业足球运动员,司职中场,效力于德国足球甲级联赛勒沃库森足球俱乐部",
	"spo_list": [{
		"predicate": "出生地",
		"object_type": "地点",
		"subject_type": "人物",
		"object": "圣地亚哥",
		"subject": "查尔斯·阿兰基斯"
	}, {
		"predicate": "出生日期",
		"object_type": "Date",
		"subject_type": "人物",
		"object": "1989年4月17日",
		"subject": "查尔斯·阿兰基斯"
	}]
}

Train

Store the data in the ./dataset/ directory for training. If the directory is empty, run the following script to automatically download the data set and pre-trained model and start training. Please keep the network open during the process to avoid model and data download failure.

# training the ner module
./scripts/run_ner.sh
# training the seq module
./scripts/run_seq.sh

Here we use a small demo to show the training briefly, in which only one batch is trained to speed up the display.

框架

notebook quick start

ske dataset training notebook Using the Chinese dataset as an example specifically introduces how to use lit_models, models and data in openue. It is convenient for users to construct their own training logic.

colab quick start Use colab for fast training your OpenUE models.

support auto parameter tuning(wandb)

# just need to replace the default logger by the wandb logger
logger = pl.loggers.WandbLogger(project="openue")

Fast depolyment

Install torchserve-docker

docker download

Create the handler class corresponding to the model

We have placed the corresponding deployment classes handler_seq.py and handler_ner.py under the deploy folder.

# use `torch-model-archiver` to pack the files
# extra-files need the files below
# 	- `config.json`, `setup_config.json` config。 
# 	- `vocab.txt` : vocab for the tokenizer
# 	- `model.py` : the code for the model

torch-model-archiver --model-name BERTForNER_en  \
	--version 1.0 --serialized-file ./ner_en/pytorch_model.bin \
	--handler ./deploy/handler.py \
	--extra-files "./ner_en/config.json,./ner_en/setup_config.json,./ner_en/vocab.txt,./deploy/model.py" -f

# put the `.mar` file to the model-store,use curl command to deploy the model
sudo cp ./BERTForSEQ_en.mar /home/model-server/model-store/
curl -v -X POST "http://localhost:3001/models?initial_workers=1&synchronous=false&url=BERTForSEQ_en.mar&batch_size=1&max_batch_delay=200"

Members

Zhejiang University:张宁豫、谢辛、毕祯、王泽元、陈想、余海阳、邓淑敏、叶宏彬、田玺、郑国轴、陈华钧

Alibaba DAMO Academy:陈漠沙、谭传奇、黄非


Citation

If you use or extend our work, please cite the following articles:

@inproceedings{zhang-2020-opennue,
    title = "{O}pe{UE}: An Open Toolkit of Universal Extraction from Text",
    author = "Ningyu Zhang, Shumin Deng, Zhen Bi, Haiyang Yu, Jiacheng Yang, Mosha Chen, Fei Huang, Wei Zhang, Huajun Chen",
    year = "2020",
}