Recently, data is very helpful. From data, we can mining value and build very application. With motivation, in this project we build a tool about information extraction filed. It can extract information in a text and detect relation between them.
We design system follow by pipeline architecture with 3 component:
- Corefence Resolution
- Named Entity Recognition
- Relation Extraction
pip install -r requirements.text
You can install docker by following:
export PYTHONPATH=./
Neo4j uses a property graph database model. A graph data structure consists of nodes (discrete objects) that can be connected by relationships. We use neo4j to save iformation after extraction.
sudo docker-compose up
Setup config pipeline at file: api/configs/pipeline_config.yaml
:
VERSION: v0.0.1
LANGUAGE: en
GENERAL:
model_dir: ./models
PIPELINE:
COREF: null
NER:
name: BertNER
package: src.tagger.BertNER
params:
model_name_or_path: models/bert-ner
max_seq_length: 256
REL:
name: BertRelCLF
package: src.relation_extraction.BertRelCLF
params:
model_name_or_path: models/bert-rel
max_seq_length: 256
A system pipeline has 3 component: Coref, Ner and Rel. You can set name and package that are specific for each component. If set is null then that component is not used.
Now, we support models for each component by following:
- ner:
- BertNer (src.tagger.BertNER)
- LstmNER (src.tagger.LstmNER)
- rel:
- BertRelCLF (src.relation_extraction.BertRelCLF)
We trained each model with Conll2004 dataset you can download from here.
After you have downloaded, you unzip and set params model_name_or_path
to file path of each model folder.
Run service with commandline (run at root project):
export PYTHONPATH=./
cp template.env .env
uvicorn app:app --reload --debug
After you have run, you can visit http://localhost:8000/docs
to try service
from src.db_api.Neo4jDB import Neo4jDB
from src.schema.schema import Relation, Entity
db = Neo4jDB(
uri="neo4j://localhost:7687",
user="neo4j",
password="160199"
)
s_e = Entity(entity="Per", value="Nguyen Van A")
t_e = Entity(entity="Loc", value="Ha Noi")
rel_type = "Live_In"
r = Relation(source_entity=s_e, target_entity=t_e, relation=rel_type)
db.create_relationship(r)
results = db.query_relation_entities(s_e)
for record in results:
print(record)
After add relation to database, you can check in : http://localhost:7474/browser/ with clause:
MATCH (p:Per {value: "Nguyen Van A"}) return p
- Use BertNER
from src.tagger.BertNER import BertNER
model_name_or_path = "models/bert-ner"
ner = BertNER(model_name_or_path=model_name_or_path)
text = "My name is Tung"
out = ner.run(text)
print(out)
- Use LstmNER
from src.tagger import LstmNER
model_name_or_path="models/lstm-ner"
ner = LstmNER.from_pretrained(model_name_or_path=model_name_or_path)
text = "I Love Anna Marry"
output = ner.run(text=text)
print(output)
- Use BertRelCLF
from src.relation_extraction.BertRelCLF import BertRelCLF
from src.data_reader import CoNLLReader
model_name_or_path = "models/bert-rel"
model = BertRelCLF(model_name_or_path=model_name_or_path)
text = "In Indiana , downed tree limbs interrupted power in parts of Indianapolis ."
entities = [{'entity': 'Loc', 'value': 'In', 'start_token': 0, 'end_token': 1},
{'entity': 'Loc', 'value': ',', 'start_token': 2, 'end_token': 3}]
print(text)
out = model.run(text=text, entities=entities)
print(out)
- Use InfoExPipeline
from src.pipelines.InfoExPipeline import InfoExPipeline
from src.utils.utils import load_yaml
config_path = "configs/pipeline_config.yaml"
config = load_yaml(config_path)
pipeline = InfoExPipeline.from_confg(config["PIPELINE"])
text = "My name is Tung , I study at Standford"
output = pipeline.run(text)
print(output)
You can play around example in folder example. We provide example code to run and train each component
CoNLL04 data: link