Skip to content

Dictionary: Diseases

Lakshmi Devi Priya edited this page Jul 28, 2020 · 17 revisions

Owner :

Priya

Dictionary :

Disease

Find here :

https://github.com/petermr/openVirus/tree/master/dictionaries/diseases/disease_new.xml

https://github.com/petermr/openVirus/blob/master/dictionaries/test/disease_synonym.xml

Creation :

https://github.com/petermr/openVirus/tree/master/dictionaries/diseases/disease_dict.md

Overview :

This dictionary contains the names of diseases that commonly occur and the names of co-occuring diseases during a viral epidemic.

Number of entries : 17223

Source :

The diseases' names were collected from Wikidata using Wikidata Query Service.

Need to be done :

This dictionary must be updated by the names of disease codes from the source ICD-10 using Wikidata.

Issues and doubts

I have updated the synonym dictionary for disease at https://github.com/petermr/openVirus/blob/master/dictionaries/test/disease_synonym.xml. I look at a few pages and I got some questions with the synonyms created in the dictionary...

  1. The synonyms included some common words/letters/numbers like 2, Male, face, and neck, X, X-linked and so on. If this dictionary was used in ami search, will it then create DataTables including these common words?
  2. I saw some words containing some special letters but they contain wikidata id or were mentioned in the synonym like Uberkoten (the correct special letters are not able to mention) has wikidata Q332590, Chédiak & François were mentioned in synonyms. Is that okay to be left or should be manualy removed?
  3. Some entry names contain the Wikidata id instead of names. The wikidata id Q886810 & Q1607642 contain no entry name but the id itself. Should I remove them manually?
  4. In the synonyms, there were bracket words like <synonym>Dwarfism : [pitutary] or [hypophyseal (& Lorain - Levi)]</synonym>. Will it be used altogether or separately or both? If only altogether, might it misses some words/terms?
  5. Some synonyms were repeated more than twice in different entries like NOS, X-linked. Should they be removed?
  6. There were also acronyms mentioned in the synonyms like PAN (Polyarteritis nodsa), DISH (Diffuse Idiopathic Skeletal Hyerostosis), etc.

Blacklist

Hand-edit list

  • For the disease_new dictionary, the SPARQL query was downloaded as csv file and changed into a text document. The changed text document transformed some symbols such as -,' into special characters. They were not recognized by amidict for dictionary creation. So, they were edited manually.

False positives

  • For the synonym dictionary, they contain common words as synonym. These will create false positives in Datatables. So, they must be removed.
Clone this wiki locally