This repository is created as part of work in - https://github.com/Lathashree01/LlamaClinicalRE. In this project, we perform domain adaptive pretraining of LLAMA models to the clinical domain. The clinical language understanding is evaluated based on evaluation datasets.
** This repository is used to evaluate the original LLaMA models and our clinical LLaMA models n2c2 2018 dataset RE dataset.
- LLaMA 1
- LLaMA 2
- Our clinical LLaMA models
(Please download original LLaMA 1 and LLaMA 2 models)
Original repository: https://github.com/uf-hobi-informatics-lab/ClinicalTransformerRelationExtraction
This package is developed for researchers to easily use state-of-the-art transformer models for extracting relations from clinical notes. No prior knowledge of transformers is required. We handle the whole process, from data preprocessing to training to prediction.
The package is built on top of the Transformers developed by the HuggingFace. We have the requirement.txt to specify the packages required to run the project.
- prerequisite
The package is only for relation extraction. Thus, the entities must be provided. You have to conduct NER first to get all entities, then run this package to get the end-to-end relation extraction results
- data format
See sample_data dir (train.tsv and test.tsv) for the train and test data format
The sample data is a small subset of the data prepared from the 2018 umass made1.0 challenge corpus
# data format: tsv file with 8 columns:
1. relation_type: adverse
2. sentence_1: ALLERGIES : [s1] Penicillin [e1] .
3. sentence_2: [s2] ALLERGIES [e2] : Penicillin .
4. entity_type_1: Drug
5. entity_type_2: ADE
6. entity_id_1: T1
7. entity_id2: T2
8. file_id: 13_10
Note:
1) the entity between [s1][e1] is the first entity in a relation; the second entity in the relation is inbetween [s2][e2]
2) Even the two entities are in the same sentence, we still require to put them separately
3) in the test.tsv, you can set all labels to neg or no_relation or whatever because we will not use the label anyway
4) We recommend evaluating the test performance in a separate process based on prediction. (see **post-processing**)
5) We recommend using official evaluation scripts to do an evaluation to make sure the results reported are reliable.
- preprocess data (see the preprocess.ipynb script for more details on usage)
We did not provide a script for training and test data generation
We have a jupyter notebook with preprocessing 2018 n2c2 data as an example
You can follow our example to generate your own dataset
- special tags
We use 4 special tags to identify two entities in a relation
# The default tags we defined in the repo are
EN1_START = "[s1]"
EN1_END = "[e1]"
EN2_START = "[s2]"
EN2_END = "[e2]"
If you need to customize these tags, you can change them in
config.py
- Training and prediction
Please refer to the original page for all details of the parameters, Some additional parameters related to LoRA peft are added in this project flag details
sh run_train_test.sh
Please note: Sample slurm scripts are also provided.
- post-processing (we only support transformation to brat format)
data_dir=./sample_data
pof=./predictions.txt
python src/data_processing/post_processing.py \
--mode mul \
--predict_result_file $pof \
--entity_data_dir ./test_data_entity_only \
--test_data_file ${data_dir}/test.tsv \
--brat_result_output_dir ./brat_output
- Running evaluation script (n2c2 2018 challenge)
python src/brat_eval.py --f1 /path_to_test_files_brat/ \
--f2 path_to_brat_output -v
f1 -> Folder path to Gold standard f2 -> Folder path to Model predicted brat files
This project is mainly developed based on the below open-source repository.
Please raise a GitHub issue if you have a problem or check the original repository.