This repository contains all the necessary scripts for training and preprocessing the data for a Variationally Regularized Graph Neural Network (VGNN) model1 with ClinicalBERT2 features integrated. It is designed to handle and process EHR, specifically the MIMIC-III dataset, and utilizes features extracted through ClinicalBERT as additional input for the VGNN.
Below is a brief overview of the main components of the repository:
Unified-ClinicalBERT-VGNN
├── model.py
├── train_combo.py
└── utils.py
preprocess
├── preprocess_GNN
│ └── preprocess_mimic.py
├── preprocess_BERT
│ ├── mk_X_BERT_matched.py
│ └── preprocess_NOTEEVENTS.py
└── get_cls.py
BERT
└── train_BERT.py
data
└── MIMIC-III data
Unified-ClinicalBERT-VGNN
: Folder containing training script for ClinicalBert and VGNN combomodel.py
: This file contains the main structure of the Graph Neural Network (GNN) model.train_combo.py
: This script is used for training the GNN model.utils.py
: This file contains utility functions used throughout the repository.preprocess
: This directory should contain your preprocessing files for GNN and ClinicalBERT (MIMIC-III dataset).get_cls.py
: A script for obtaining the [CLS] token as embeddings from the MIMIC data to be fed to the GNN model.preprocess_GNN
: This directory contains script (preprocess_mimic.py
) for preprocessing MIMIC-III data for GNN and creating mappings of labels for ClinicalBERT.preprocess_BERT
: This directory contains scripts (mk_X_BERT.py
,mk_X_BERT_matched.py
,preprocess_NOTEEVENTS.py
) for preprocessing data with ClinicalBERT.BERT
: This directory contains script (train_BERT.py
) which is used for fine-tuning training the ClinicalBERT model.data
: This directory should contain your input data (MIMIC-III dataset).
First, make sure you download your MIMIC-III data into the data
directory. The scripts are designed to read from this location. Then, follow the sequence of steps below:
- Run
preprocess_mimic.py
inpreprocess_GNN
, andpreprocess_NOTEEVENTS.py
, andmk_X_BERT_matched.py
inpreprocess_BERT
to preprocess your MIMIC-III data. - Run
train_BERT.py
inBERT
to finetune the ClinicalBERT for the same prediction task as the VGNN. - Run
get_cls.py
inpreprocess
to obtain the additional features generated by ClinicalBERT. - Run
train_combo.py
inUnified-ClinicalBERT-VGNN
to train your ClinicalBERT + VGNN Combo model.
Please ensure you have the necessary dependencies installed. If not, install them with:
pip install -r requirements.txt
Footnotes
-
This code was modified from the following repo https://github.com/NYUMedML/GNN_for_EHR/tree/master ↩
-
This code was modified from the following repo https://github.com/EmilyAlsentzer/clinicalBERT ↩