Skip to content

Dictionary: Diseases

Lakshmi Devi Priya edited this page Aug 9, 2020 · 17 revisions

Owner :

Priya

Dictionary :

Disease

Find here :

From SPARQL as text file - https://github.com/petermr/openVirus/tree/master/dictionaries/diseases/disease_new.xml

From SPARQL as xml file

i) synonym dictionary - https://github.com/petermr/openVirus/blob/master/dictionaries/diseases/disease_synonym.xml

ii) latest dictionary with ICD-10 codes - https://github.com/petermr/openVirus/blob/master/dictionaries/diseases/disease_icd10.xml

Creation :

The creation of these 3 disease dictionaries are mentioned at

https://github.com/petermr/openVirus/tree/master/dictionaries/diseases/disease_dict.md

Overview :

This dictionary contains the names of diseases that commonly occur and the names of co-occuring diseases during a viral epidemic.

Source :

From Wikidata using Wikidata Query Service (SPARQL).

The latest dictionary - From ICD-10 and SPARQL.

Need to be done :

To add various language descriptions, thus making the disease dictionary multilingual(possibly).

Issues and doubts

I have updated the synonym dictionary for disease at https://github.com/petermr/openVirus/blob/master/dictionaries/test/disease_synonym.xml. I look at a few pages and I got some questions with the synonyms created in the dictionary...

  1. The synonyms included some common words/letters/numbers like 2, Male, face, and neck, X, X-linked and so on. If this dictionary was used in ami search, will it then create DataTables including these common words?
  2. I saw some words containing some special letters but they contain wikidata id or were mentioned in the synonym like Uberkoten (the correct special letters are not able to mention) has wikidata Q332590, Chédiak & François were mentioned in synonyms. Is that okay to be left or should be manualy removed?
  3. Some entry names contain the Wikidata id instead of names. The wikidata id Q886810 & Q1607642 contain no entry name but the id itself. Should I remove them manually?
  4. In the synonyms, there were bracket words like <synonym>Dwarfism : [pitutary] or [hypophyseal (& Lorain - Levi)]</synonym>. Will it be used altogether or separately or both? If only altogether, might it misses some words/terms?
  5. Some synonyms were repeated more than twice in different entries like NOS, X-linked. Should they be removed?
  6. There were also acronyms mentioned in the synonyms like PAN (Polyarteritis nodsa), DISH (Diffuse Idiopathic Skeletal Hyerostosis), etc.

Preferring xml file

TEXT FILE from SPARQL

  • Downloading as a csv document from SPARQL and converting into a text file does many changes in the diseases' names such as the symbols -,. were changed into some special characters. They were manually rectified and this took a lot of time.
  • In the text document, only the disease name can be mentioned not other attributes like description, altLabels, etc. Only one disease name should be in one line.

XML FILE from SPARQL

  • Downloading as a xml file from SPARQL does not need any manual rectification. They can be used to create dictionaries directly using similar syntax.
  • The other attributes of diseases (or any other dictionary) can be mentioned in the xml file and can be converted into the dictionary easily.
Clone this wiki locally