Name		Name	Last commit message	Last commit date
parent directory ..
config		config
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
load.py		load.py
model.py		model.py
predict.fish		predict.fish
predict.py		predict.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
train.fish		train.fish
train.py		train.py
utils.py		utils.py

README.md

BERT-Sequence-Labeling

This repostiory integrates HuggingFaces's models in an end-to-end pipeline for sequence labeling. Here is a complete list of the available models.

Install

$ git clone https://github.com/avramandrei/BERT-Sequence-Labeling.git
$ cd BERT-Sequence-Labeling
$ conda create -n bert-sl python=3.10
$ conda activate bert-sl
$ pip install -r requirements.txt

Input Format

The files used for training, validation and testing must be in the following format:

Each line contains the token and the label separated by space
Each document or sentence is separated by a blank line

The labels can be whatever you want.

This O
is O
the O
first O
sentence B-Label1
. I-Label1

This B-Label2
is I-Label2
the O
second O

There can be other columns in the file, and the token-label order can be switched. All that matters is that you use the correct column indices (starting from 0) when calling the scripts, and that you keep the sentences or documents separated by a blank line.

Training

To train a model, use the train.py script. This will start training a model that will predict the labels of the column specified by the [predict_column] argument.

python3 train.py [path_train_file] [path_dev_file] [tokens_column] [predict_column] [lang_model_name]

Inference

To predict new values, use the predict.py script. This will create a new file by replacing the predicted column of the test file with the predicted values.

python3 predict.py [path_test_file] [model_path] [tokens_column] [predict_column] [lang_model_name]

Results

FGCR

See data/fgcr for the data and attribution.

model	macro_f1
bert-base-cased	73.23
roberta-base	74.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sequence_labelling

sequence_labelling

README.md

BERT-Sequence-Labeling

Install

Input Format

Training

Inference

Results

FGCR

Files

sequence_labelling

Directory actions

More options

Directory actions

More options

Latest commit

History

sequence_labelling

Folders and files

parent directory

README.md

BERT-Sequence-Labeling

Install

Input Format

Training

Inference

Results

FGCR