BERT ParsCit

Description

This is the repository of BERT ParsCit and is under active development at National University of Singapore (NUS), Singapore. The project was built upon a template by ashleve. BERT ParsCit is a BERT version of Neural ParsCit built by researchers under WING@NUS.

Installation

# clone project
git clone https://github.com/ljhgabe/BERT-ParsCit
cd BERT-ParsCit

# [OPTIONAL] create conda environment
conda create -n myenv python=3.8
conda activate myenv

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -r requirements.txt

Set up PDF parsing engine s2orc-doc2json

The current doc2json tool is used to convert PDF to JSON. It uses Grobid to first process each PDF into XML, then extracts paper components from the XML. To setup Doc2Json, you should run:

sh bin/doc2json/scripts/run.sh

This will setup Doc2Json and Grobid. And after installation, it starts the Grobid server in the background by default.

Example usage

from src.pipelines.bert_parscit import predict_for_string, predict_for_text, predict_for_pdf

str_result = predict_for_string(
    "Calzolari, N. (1982) Towards the organization of lexical definitions on a database structure. In E. Hajicova (Ed.), COLING '82 Abstracts, Charles University, Prague, pp.61-64.")
text_result = predict_for_text("test.txt")
pdf_result = predict_for_pdf("test.pdf")

How to train

Train model with default configuration

# train on CPU

python train.py trainer=cpu

# train on GPU
python train.py trainer=gpu

Train model with chosen experiment configuration from configs/experiment/

python train.py experiment=experiment_name.yaml

You can override any parameter from command line like this

python train.py trainer.max_epochs=20 datamodule.batch_size=64

To show the full stack trace for error occurred during training or testing

HYDRA_FULL_ERROR=1 python train.py

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
bin		bin
configs		configs
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
README_details.md		README_details.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BERT ParsCit

Description

Installation

Set up PDF parsing engine s2orc-doc2json

Example usage

How to train

About

Releases

Packages

Contributors 4

Languages

ljhgabe/BERT-ParsCit

Folders and files

Latest commit

History

Repository files navigation

BERT ParsCit

Description

Installation

Set up PDF parsing engine s2orc-doc2json

Example usage

How to train

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages