Skip to content

Latest commit

 

History

History
32 lines (20 loc) · 1.18 KB

README.md

File metadata and controls

32 lines (20 loc) · 1.18 KB

Name Entity Recognition for Brazilian Portuguese.

The present repository is a study of Name Entity Recognition application to detect proper nouns in Portuguese.

The study was carried on CE-DOHS corpus (Corpus Eletrônico de Documentos Históricos do Sertão).

CE-DOHS was preprocessed and, later, annotated using label-studio. This study aimed to calculate the NER F-Score using BI-LSTM-CRF deep learning algorithm.

All the process can be found in Full Pipeline notebook.

The F-Score obtained was 0.97

How to use: Notebook version

  1. conda create -n NameAnonPT
  2. conda activate NameAnonPT
  3. pip install -r requirements.txt
  4. jupyter lab

How to use: Airflow Version

OBS: this version requires Docker and Docker-Compose.

  1. Go to Airflow folder
  2. docker-compose up
  3. Access localhost:/8080
  4. run cartasAnonPT DAG

Below some screenshots:

DAG tree