BERT ParsCit

Description

This is the repository of BERT ParsCit and is under active development at National University of Singapore (NUS), Singapore. The project was built upon a template by ashleve. BERT ParsCit is a BERT version of Neural ParsCit built by researchers under WING@NUS.

Installation

# clone project
git clone https://github.com/ljhgabe/BERT-ParsCit
cd BERT-ParsCit

# [OPTIONAL] create conda environment
conda create -n myenv python=3.8
conda activate myenv

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -r requirements.txt

Example usage

from bert_parscit import predict_for_text

result = predict_for_text("Calzolari, N. (1982) Towards the organization of lexical definitions on a database structure. In E. Hajicova (Ed.), COLING '82 Abstracts, Charles University, Prague, pp.61-64.")

How to train

Train model with default configuration

# train on CPU
python train.py trainer.gpus=0

# train on GPU
python train.py trainer.gpus=1

Train model with chosen experiment configuration from configs/experiment/

python train.py experiment=experiment_name.yaml

You can override any parameter from command line like this

python train.py trainer.max_epochs=20 datamodule.batch_size=64

To show the full stack trace for error occurred during training or testing

HYDRA_FULL_ERROR=1 python train.py

How to Parse Reference Strings from a PDF

Setup Doc2Json

First prepare for the environment:

cd ./tools
python setup.py develop

The current grobid2json tool uses Grobid to first process each PDF into XML, then extracts paper components from the XML.

Install Grobid

You will need to have Java installed on your machine. Then, you can install your own version of Grobid and get it running, or you can run the following script:

bash tools/scripts/setup_grobid.sh

This will setup Grobid, currently hard-coded as version 0.6.1. Then run:

bash tools/scripts/run_grobid.sh

to start the Grobid server. Don't worry if it gets stuck at 87%; this is normal and means Grobid is ready to process PDFs.

Extract Reference Strings from a PDF File

You can extract strings you need with the script. For example, to get reference strings, try:

python pdf2text.py --input_file tools/tests/pdf/2020.acl-main.207.pdf --reference
 --output_dir output/ --temp_dir temp/

With --reference, this will generate a text file of reference strings in the specified output_dir. And the JSON format of the origin PDF will be saved in the specified temp_dir. The default output_dir is output/ from your path and the default temp_dir is temp/ from your path.

Parse Reference Strings from a Text File

To predict the reference string tags, try:

from bert_parscit import predict_for_file
res = predict_for_file("output/N18-3011_ref.txt",output_dir="result")

The prediction result is saved in output_dir.If unspecified, the file will be in the result/ directory from your path.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
configs		configs
scripts		scripts
src		src
tests		tests
tools		tools
.gitignore		.gitignore
README.md		README.md
bert_parscit.py		bert_parscit.py
pdf2text.py		pdf2text.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BERT ParsCit

Description

Installation

Example usage

How to train

How to Parse Reference Strings from a PDF

Setup Doc2Json

Install Grobid

Extract Reference Strings from a PDF File

Parse Reference Strings from a Text File

About

Releases

Packages

Languages

dyxohjl666/BERT-ParsCit

Folders and files

Latest commit

History

Repository files navigation

BERT ParsCit

Description

Installation

Example usage

How to train

How to Parse Reference Strings from a PDF

Setup Doc2Json

Install Grobid

Extract Reference Strings from a PDF File

Parse Reference Strings from a Text File

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages