A curated list of awesome healthcare taxonomies and knowledge graphs. We know we may have missed important softwares or literatures, please feel free to create an issue for any suggesstions.
What are the differences between ontology and taxonomy? See discussion 🔗 here>>.
Name | Paper | Misc. |
---|---|---|
Mondo Disease Ontology | [Website] | A semi-automatically constructed ontology that merges in multiple disease resources to yield a coherent merged ontology. |
MeSH Ontology | [Website] | MeSH includes the subject headings appearing in MEDLINE/PubMed, the NLM Catalog, and other NLM databases. |
UMLS Semantic Network | [Website] | Broad categories (semantic types) and their relationships (semantic relations) for UMLS Metathesaurus |
SNOMED CT | [Website] | A multilingual hierarchical organized medical terms providing codes, terms, synonyms and definitions used in clinical documentation and reporting. |
Disease Ontology | The Human Disease Ontology 2022 update (Nucleic Acids Research'22) [Website] | 10,862 disease terms, 22,137 new SubClassOf Axioms |
Gene Ontology | [Citation Policy] [Website] | three ontologies: Molecular Function, Cellular Component, Biological Process |
Cell Taxonomy | Cell Taxonomy: a curated repository of cell types with multifaceted characterization (Nucleic Acids Research'22) [Website] | 3,143 cell types, 26,613 associated cell markers in 257 conditions and 387 tissues across 34 species. |
Name | Paper | Domain | Scale | Data Sources |
---|---|---|---|---|
General KGs | ||||
ClinicalKG (CKG) | Clinical Knowledge Graph Integrates Proteomics Data into Clinical Decision-Making (bioRxiv) [Website] [Code] | clinical, laboratory and imaging data, multiomics data, and EHRs | 16 million nodes and 220 million relationships | integrate 25 KGs, 10 ontologies(taxos) |
MSI | Identification of disease treatment mechanisms through the multiscale interactome (Nature Communications'21) [Code] | Drugs, Proteins, Diseases, Biological Functions, Gene | 1,661 drugs, 840 disease, 17,660 proteins, 9,798 biological functions | |
Hetionet | Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes (PLOS Computational Biology'15) [Website] | 11 types of nodes, 24 types of edges | combines information from 29 public databases. 47,031 nodes (11 types) and 2,250,197 relationships (24 types) | Entrez Gene, DrugBank, Uberon, Disease Ontology, MeSH ontology, SIDER, UMLS, Gene Ontology, WikiPathways, Reactome, Pathway Interaction Database, DrugCentral |
iBKH | iBKH: The integrative Biomedical Knowledge Hub (Iscience'23) [Code] | Anatomy, Disease, Drug, Gene, Molecule, Symptom, Dietary Supplement Ingredient/Product, Therapeutic Class, Pathway, Side-Effect | 2M entities, 48M relations | Integrate 18 public data sources |
PrimeKG | Building a knowledge graph to enable precision medicine (Scientific Data'23) | Biological process, Protein, Disease, Phenotype, Anatomy, Molecular function, Drug, Cellular component, Pathway, Exposure | 129,375 nodes, 4,050,249 edges | integrates 20 high-quality resources |
Disease-specific KGs | ||||
DRKG | [Blog article'22] [Code] | genes, compounds, diseases, biological processes, side effects and symptoms focusing on drug repurposing for COVID-19. | 97,238 entities belonging to 13 entity-types; and 5,874,261 triplets belonging to 107 edge-types. | DrugBank, Hetionet, GNBR, String, IntAct and DGIdb, and Covid19 literatures. |
COVID-KG | COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation (NAACL'21) [Website] | Gene, Disease, Chemical, Organism focusing on COVID-19 | 50,752 Gene nodes, 10,781 Disease nodes, 5,738 Chemical nodes, and 535 Organism nodes; 133 relation types | COVID scientific literature, and existing CTD, MESH |
KGHC | KGHC: a knowledge graph for hepatocellular carcinoma (BMC Medical Informatics and Decision Making'20) | focusing on hepatocellular carcinoma | 5,028 entities and 13,296 triples | SemMedDB, Literature, Clinical Trials |
Drug-specific KGs | ||||
repoDB | A Standard Database for Drug Repositioning (Scientific Data'17) [Website] | Drug, Disease | 1,571 drugs, 2,051 diseases | N/A |
DrugBank | DrugBank: a comprehensive resource for in silico drug discovery and exploration (Nucleic Acids Research'06) [Website] | Drug, and drug target (i.e. sequence, structure, pathway) | 15,686 drug, 5,296 non-redundant protein sequences | N/A |
DrugCentral | DrugCentral: online drug compendium (Nucleic Acids Research'16) [Website] | Drug, Target, Disease, Pharmacologic action, Active Ingredients | 112,359 FDA drug labels, 4,927 Active Ingredients, 137,693 Pharmaceutical formulations | N/A |
Protein-specific KGs | ||||
The Human Protein Atlas | [Website] | Proteins, Genes, Tissues, Cell, Pathology, Disease | 27520 antibodies targeting, 17288 unique proteins | N/A |
Proteinarium | Proteinarium: Multi-sample protein-protein interaction analysis and visualization tool(Genomics'20) [Website] | multi-sample protein-protein interaction | TB Release |
Interested in the interaction between Large Language Models and KB? See this amazing resource 🔗 here>>.
Name | Paper | Used Resources |
---|---|---|
LM-Bio-KGC | Scientific Language Models for Biomedical Knowledge Base Completion: An Empirical Study (AKBC'21) [Code] | repoDB, MSI, Hetionet |
AutoRD | AutoRD: An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontologies-enhanced Large Language Models (ArXiv'24) [Code] | RareDis-v1 |
Problem Definition: Given two ontoloties
Name | Paper | Datasets | Notes |
---|---|---|---|
LogMap | LogMap: Logic-based and Scalable Ontology Matching (ISWC'11) [Github] | FMA-NCI, FMA-SNOMED, SNOMED-NCI, Mouse-NCIAnat | a highly scalable ontology matching system in Java |
PARIS | PARIS: Probabilistic Alignment of Relations, Instances, and Schema (VLDB'12) [Github] | YAGO-DBpedia | a non-neural entity, relation and ontology alignment system in Java. |
AML | The AgreementMakerLight Ontology Matching System (OTM'13) [Github] | OAEI'2012: FMA-NCI, FMA-SNOMED, SNOMED-NCI | An element-level ontology alignment system in Java. |
OAEI | [Website] | Mondo: OMIM-ORDO, NCIT-DOID; UMLS: SNOMED-FMA, SNOMED-NCIT | Ontology Alignment Evaluation Initiative since 2011 |
MEDTO | MEDTO: Medical Data to Ontology Matching Using Hybrid Graph Neural Networks (KDD'21) | databases: MIMIC-III, MDX; ontology: OAEI | database to ontology matching task |
Problem Definition: Given two KGs
Name | Paper | Baselines | Datasets |
---|---|---|---|
industry eval EA | An Industry Evaluation of Embedding-based Entity Alignment (COLING'17) | BootEA, MultiKE, RDGCN, RSN4EA, PARIS | cross-lingual EA: DBP15K, WK3160K; cross-KG (DBpedia and Wikidata) EA: DWY15K, DWY100K, MED-BBK-9K: contains two Chinese medical KGs. |
OpenEA | [Code] | 20+ methods | cross-lingual DBpedia: EN-FR, EN-DE; cross-KG: D-W(ikidata), D-Y(AGO) |
UED | Semi-constraint Optimal Transport for Entity Alignment with Dangling Cases (arxiv'22) [Code] | MTransE, JAPE, BootEA, RDGCN, RNM, RAGA, EchoEA, SelfKG, SoTead, UEA, SEU | cross-lingual EA: DBP15K, DBP2.0; cross-lingual medical-KG EA: MedED |
OntoEA | OntoEA: Ontology-guided Entity Alignment via Joint Knowledge Graph Embedding (Findings of ACL 2021) [Code] | OpenEA methods | shared Ontology: EN-FR, EN-DE, MED-BBK; not shared Ontology: D-W |
SapBERT | Self-Alignment Pretraining for Biomedical Entity Representations (NAACL'21) [Code] | BioBERT, BlueBERT, ClinicalBERT, SciBERT, UMLSBERT, PubMedBERT | NCBI, BC5CDR, MedMentions |
HiPrompt | HiPrompt: Few-Shot Biomedical Knowledge Fusion via Hierarchy-Oriented Prompting (SIGIR'23) | LogMap, PARIS, AML, SapBERT, SelfKG, MTransE; HiPrompt: LLM-based entity alignment method | SDKG-DO, repoDB-DO |