This repo contains code for our HiCu DLH 2023 paper HiCu: Leveraging Hierarchy for Curriculum Learning in Automated ICD Coding.
Install the following packages to run the code in this repository:
- gensim==4.3.0
- nltk==3.6.5
- numpy==1.23.5
- pandas==1.4.2
- scikit_learn==1.2.0
- scipy==1.10.0
- torch==1.13.1
- tqdm==4.62.3
- transformers==4.26.1
- geoopt==0.5.0
pip install -r requirements.txt
We use MIMIC-III for model training and evaluation. We use the same data preprocessing code as MultiResCNN. To set up the dataset, place the MIMIC-III files into /data
as shown below:
data
| D_ICD_DIAGNOSES.csv
| D_ICD_PROCEDURES.csv
└───mimic3/
| | NOTEEVENTS.csv
| | DIAGNOSES_ICD.csv
| | PROCEDURES_ICD.csv
| | train_full_hadm_ids.csv
| | train_50_hadm_ids.csv
| | dev_full_hadm_ids.csv
| | dev_50_hadm_ids.csv
| | test_full_hadm_ids.csv
| | test_50_hadm_ids.csv
The *_hadm_ids.csv
files can be found here.
After setting up the files, run the following command to preprocess the data:
python preprocess_mimic3.py
A large portion of the code in this repository is borrowed from https://github.com/wren93/HiCu-ICD. Thanks to their great work.