MIRACLE is a project that leverages Prodigy to train a model that annotates DailyMed drug indication sections with their corresponding medical context.
-
Clone the repository
git clone https://github.com/MaastrichtU-IDS/MIRACLE/ cd miracle
-
Create a .env file
PRODIGY_KEY=XXXX-XXXX-XXXX-XXXX PRODIGY_ALLOWED_SESSIONS=USERNAME_1,USERNAME_2 PRODIGY_BASIC_AUTH_PASS=XXXX
-
Build and deploy with Docker
docker compose up -d--build --remove-orphans
View the logs
docker compose logs
-
Enter the container to run commands
docker exec -it prodigy-dailymed bash
-
Stop and remove the container
docker compose down
MIRACLE uses the following annotation labels:
- DRUG: The drug name or active ingredient.
- SALT: The salt form of the drug.
- ROUTE: The route of administration (e.g., oral, intravenous, topical).
- FORMULATION: The formulation of the drug (e.g., solution, tablet).
- MECHANISM: The drug's mechanism of action (e.g., protein inhibitor).
- ACTION: The action between the drug and the condition (e.g., treatment of, management of).
- SEVERITY: The severity of the indicated condition (e.g., mild-to-moderate, severe).
- INDICATION: The condition that the drug is indicated for (e.g., tremors, pain).
- BASECONDITION: The underlying pathophysiological state in which the indicated condition is a symptom of (e.g., Parkinson's, cancer).
- ANATOMY: The specific anatomical part for which the condition is localized (e.g., skin).
- CAUSED_BY: The cause of the condition that is to be treated (e.g., E. coli for an infection).
- SYMPTOM: Symptoms exhibited by the patient but not the treatable indication.
- TARGET_GROUP: The group of patients or individuals for whom a particular drug is intended or designed to be used.
- CO_MORBIDITY: Other conditions that may be present in the target group.
- CO_PRESCRIPTION: Other medications that may be in use by the target group.
- HISTORY: Statements relating to the medical history of the target group.
- TEMPORALITY: Statements relating to the temporality of the treatment.
- INEFFECTIVE: The manner in which the drug is ineffective.
- EFFECT: Intended beneficial effects of the treatment.
- SIDEEFFECT: Negative side effects of the treatment.
- CONTRAINDICATION: Statements specifying when the drug should not be used.
- MEDICAL_CTX: Statements discussing the medical context beyond the labels mentioned above.
-
Start by manually annotating samples using the provided labels to create a training dataset. Use the following command:
prodigy ner.manual ner_indications blank:en ./data/dailymed/curation.jsonl --label ./labels.txt
The server will be available on http://localhost:8080 by default, and user-specific annotations can be done at http://localhost:8080/?session=USERNAME. For the first time you need to sign in with the username "prodigy-user" and the PRODIGY_BASIC_AUTH_PASS which should be in the .env file.
💾 Make sure to save your progress by clicking the small floppy disk icon or pressing CTRL-S.
-
Review the Dataset and Manage Annotations:
List all datasets an sessions:
prodigy stats -ls
Annotations can be both exported and imported using the 'db-out' and 'db-in' commands, respectively
Export annotations:
prodigy db-out ner_indications > data/annotations/ner_indications_annotations.jsonl
Import annotations:
prodigy db-in ner_indications backup/latest.jsonl
Resolve conflicts in multiple-user sessions using the review recipe:
prodigy review ner_indications_review ner_indications --label ./labels.txt
Delete unused datasets:
prodigy drop unused_dataset
-
Model Training
To train the model with the set of annotations, you provide the training dataset and an output directory for the model.
The model is saved as model-best and model-last in the given output directory. model-best indicates the best version and the model-last indicates the last version of model during the training iterations.
prodigy train ./models/miracle --ner ner_indications --label-stats
-
Correcting
Correct the model's suggestions (excluding already annotated samples from "ner_indications" dataset):
prodigy ner.correct ner_indications ./models/miracle/model-best ./data/dailymed/curation.jsonl --label ./labels.txt --exclude ner_indications
-
Retraining
After correcting the model's suggestion, re-train the last-model and depending on the model's performance continue with correction and training.
prodigy train ./models/miracle --ner ner_indications --label-stats --base-model ./models/miracle/model-best
You can use train-curve recipe to see whether more data improves the model or not. As a rule of thumb, if accuracy improves within the last 25%, training with more examples will likely result in better accuracy.
prodigy train-curve --ner ner_indications --show-plot
For more ways to use Prodigy, explore the prodigy-recipes repository.
-
Creating Configuration File
To use a LLM with spaCy you’ll need a configuration file that tells spacy-llm how to construct a prompt for your task. We already made a configuration for OpenAI. You can find a sample configuration for OpenAI in 'spacy-llm-config.cfg'.
You might be using a vendor like OpenAI, as a backend for your LLM. In such cases you’ll need to setup up secrets such that you can identify yourself.
OPENAI_API_ORG = "org-..." OPENAI_API_KEY = "sk-..."
-
LLM Prompt
The ner.llm.correct recipe fetches examples from large language models while annotating and it allows you to accept them as correct, or to manually curate them.
dotenv run -- prodigy ner.llm.correct ner_indications spacy-llm-config.cfg data/dailymed/curation.jsonl
The ner.llm.fetch recipe can fetch a large batch of examples upfront.
dotenv run -- prodigy ner.llm.fetch spacy-llm-config.cfg data/dailymed/curation.jsonl data/model_annotations/annotations_curation_md__gpt3.jsonl
After downloading such a batch of examples you can use ner.manual to correct the annotations.
prodigy ner.manual ner_indications blank:en data/model_annotations/annotations_curation_md__gpt3.jsonl --label labels.txt
This table provides a clear overview of the F1 scores achieved by different models on various datasets.
Model / Dataset | curation_md | curation_gpt3 | NeuroDKG |
---|---|---|---|
miracle | 61.02 | - | 64.89 |
gpt3_trained | 28.15 | 53.51 | 65.75 |
GPT-3.5 | 25.46 | - | 71.44 |
The analysis of datasets and models is available in the result folder. The results are organized into two main folders for ease of access:
-
Dataset Analysis: The 'dataset_analysis' folder contains files that provide insights into label counts and frequencies for various datasets.
-
Model Performance Analysis: The 'model_perf_analysis' directory contains files that present precision, recall, and F1 scores for each label based on the selected model, along with label count and frequency details.
In the analysis files, we follow a specific naming convention to make it easy to identify the dataset and model used for each analysis:
- Files that include 'train' after the dataset name indicate that the analysis pertains to the training portion (80%) of the dataset.
- Files that include 'eval' after the dataset name indicate that the analysis pertains to the evaluation portion (20%) of the entire dataset.
All the datasets used in the analysis can be found in the 'data' folder.
Here are some examples of the naming convention:
-
curation_gpt3_train_analysis.csv
- Dataset: curation_gpt3 (train)
-
curation_md_train_analysis.csv
- Dataset: curation_md (train)
-
neuro_dkg__gpt3_trained_analysis.csv
- Dataset: neuro_dkg
- Model: gpt3_trained
-
curation_md_eval__miracle_analysis.csv
- Dataset: curation_md (eval)
- Model: miracle
These analysis files provide valuable insights into the performance of different models on various datasets.
For future work, consider the following:
- Annotation processes could be performed by a team following a well-defined annotation protocol to improve dataset quality.
- Training a new model with the updated annotation dataset while ensuring a high F1 score.
- Annotating all DailyMed indication texts with this new model to create a comprehensive database.