-
Notifications
You must be signed in to change notification settings - Fork 17
Dictionary: Diseases
Priya
Disease
From SPARQL as text file
- https://github.com/petermr/openVirus/tree/master/dictionaries/diseases/disease_new.xml
From SPARQL as xml file
i) synonym
dictionary - https://github.com/petermr/openVirus/blob/master/dictionaries/diseases/disease_synonym.xml
ii) latest
dictionary with ICD-10 codes
- https://github.com/petermr/openVirus/blob/master/dictionaries/diseases/disease_icd10.xml
iii) multilingual
dictionary of 3 languages - https://github.com/petermr/openVirus/blob/master/dictionaries/diseases/disease_lang.xml
The creation of these 4 disease
dictionaries are mentioned at
https://github.com/petermr/openVirus/tree/master/dictionaries/diseases/disease_dict.md
This dictionary contains the names of diseases
that commonly occur and the names of co-occuring diseases during a viral epidemic.
From Wikidata
using Wikidata Query Service (SPARQL).
The latest
and multilingual
dictionary - From ICD-10
and SPARQL
.
To iterate synonyms in the disease
dictionaries for getting accurate results.
I have updated the synonym dictionary for disease
at https://github.com/petermr/openVirus/blob/master/dictionaries/test/disease_synonym.xml. I look at a few pages and I got some questions with the synonyms
created in the dictionary...
- The
synonyms
included some common words/letters/numbers like2
,Male
,face
,and neck
,X
,X-linked
and so on. If this dictionary was used inami search
, will it then createDataTables
including these common words? - I saw some words containing some special letters but they contain wikidata id or were mentioned in the synonym like
Uberkoten
(the correct special letters are not able to mention) has wikidataQ332590
,Chédiak
&François
were mentioned insynonyms
. Is that okay to be left or should be manualy removed? - Some entry names contain the Wikidata id instead of names. The wikidata id
Q886810
&Q1607642
containno entry name
but the id itself. Should I remove them manually? - In the synonyms, there were bracket words like
<synonym>Dwarfism : [pitutary] or [hypophyseal (& Lorain - Levi)]</synonym>
. Will it be used altogether or separately or both? If only altogether, might it misses some words/terms? - Some synonyms were repeated more than twice in different entries like
NOS
,X-linked
. Should they be removed? - There were also
acronyms
mentioned in thesynonyms
likePAN
(Polyarteritis nodsa),DISH
(Diffuse Idiopathic Skeletal Hyerostosis), etc.
- Downloading as a csv document from SPARQL and converting into a text file does many changes in the diseases' names such as the symbols
-
,.
were changed into some special characters. They were manually rectified and this took a lot of time. - In the text document, only the disease name can be mentioned not other attributes like description, altLabels, etc. Only one disease name should be in one line.
- Downloading as a xml file from SPARQL does not need any manual rectification. They can be used to create dictionaries directly using similar syntax.
- The other attributes of diseases (or any other dictionary) can be mentioned in the xml file and can be converted into the dictionary easily.