-
Notifications
You must be signed in to change notification settings - Fork 17
miniproject: viral epidemics and disease
Lakshmi Devi Priya edited this page Sep 20, 2020
·
34 revisions
Priya
Dheeraj Kumar
- Use the communal corpus
epidemic50noCov
consisting of 50 articles.CREATED
- Scrutinizing the 50 articles to know the true positives and false positives, that is, whether the articles are about viral epidemic or not.
FINISHED
- Using
ami search
to find whether the articles mentioned any comorbidity in a viral epidemic or not, annotating with dictionaries to create ami DataTables.FINISHED
- Sectioning the articles using
ami:section
to extract the relevant information on comorbidity.FINISHED
- Refining and rerunning the query to get a corpus of 950 articles.
CREATED
- Scrutinizing the 950 articles for true positives and false positives and creating a spreadsheet.
PROGRESSING
- Using
ami search
to create DataTables andami section
for sectioning the 950 articles.FINISHED
- Using relevant ML technique for the classification of data whether the articles are based on viral epidemic and the diseases/disorders that co-occur.
PROGRESSING
- Creating a dashboard of knowledge, especially with an annotated map.
NOT STARTED
- A spreadsheet will be developed based on the comorbidity during a viral epidemic and their count;
- for 50 articles in
epidemic50noCov
.FINISHED
- for 950 articles in
disease
corpus.PROGRESSING
- Development of the ML model for data classification on accuracy.
PROGRESSING
- Annotated map with the obtained data.
NOT STARTED
- Initially the communal corpus
epidemic50noCov
will be used. (A small test corpus for using the large corpusdisease
) - Later a corpus of 950 articles created in
disease
corpus, using the syntaxgetpapers -q "viral epidemics AND human NOT COVID NOT corona virus NOT SARS-Cov-2" -o disease -f disease/log.txt -k 950 -x -p
, will be used.
- Disease (Details at https://github.com/petermr/openVirus/wiki/Dictionary:-Diseases)
-
getpapers
to create the corpus of 950 articles fromEuPMC
. -
AMI
for creating DataTables, creating and using dictionaries, sectioning. -
SPARQL
for creating dictionaries. -
KNIME
for workflow and analytics. -
keras
for binary classification.
- 50 articles corpus
epidemic50noCov
at - https://github.com/petermr/openVirus/tree/master/miniproject/epidemic50noCov - for
getpapers
- https://github.com/petermr/openVirus/wiki/getpapers - for installing & updating
ami
- https://github.com/petermr/openVirus/wiki/Tools:-ami3 - for
ami search
- https://github.com/petermr/openVirus/wiki/ami-search - for
ami section
- https://github.com/petermr/openVirus/wiki/ami:section - for
SPARQL
- https://github.com/petermr/openVirus/wiki/Tools-:-SPARQL - 950 articles corpus
disease
at - https://github.com/petermr/openVirus/tree/master/miniproject/disease - for ML technique
jupyter notebook
is used - https://github.com/petermr/openVirus/wiki/Jupyter-Notebooks#data-preparation-for-ml
(by collaborator Dheeraj)
What is our aim first of all, that if we recognize diseases, then we will be able to give medicines for it.
In this mini project, we will be able to find diseases with the help of disease
dictionary in times of "viral epidemic" by using ContentMine software ( getpapers and ami)
- The names of all diseases are updated in the dictionary of diseases which are helpful in searching particular diseases' words in the articles, just like the dictionary contains a store of words.
- It's source is ICD-10(by WHO) and Wikidata and it was created using ami.
- This is a group of articles which is based on viral epidemics and diseases. These articles contain information regarding diseases which are to be simplified.
- This is a group of 950 articles that have been downloaded from EPMC via getpapers.
This is a Pub Med Central website with a lot of scientific research knowledge articles. We are analyzing some of these articles for our mini-project, which are downloaded using getpapers.
- It is a ContentMine software capable of downloading large number of articles from Eupmc.
- See https://github.com/petermr/openVirus/wiki/getpapers for using.
- It is also a ContentMine software. It is used in creating a dictionary. It is useful for searching particular diseases' words that are updated in dictionary, sectioning downloaded articles and gathering information from them.
- Like in this, we have created a dictionary of disease.
- the online query service by wikidata it's include everything whatever you want.
- In this mini project we should have ICD-10 code for Disease.
- And wanted the result in different languages.
- And we received the following result. (https://w.wiki/ZCr)
- I have read about getpapers and EuPMC and also I have read about advanced search in EuPMC and Reading its articles too.
- I read wikidata and learned to update the dictionary.
- Also updated the Dictionary with the help of Wikidata Query with the ICD-10 codes.
- So far I have manually classified some articles as True and False Positives.
- Created a SPARQL query for multilingual
disease
dictionary.
- As said that if diseases are known, then we can give medicines accordingly. Therefore, our main goal will be to find out the names of diseases that co-occur during viral epidemics and work accordingly.
- Now have to manually classify all the articles into true positive and false positive.
- Learning
KNIME
for workflow and analytics. - Learning
Keras
and Python code inJupyter Notebook
to use in binary classification.
- The 950 article corpus was large in size and hence using
ami search
popped the OutOfMemoryError. - Hence, the
disease
corpus (Cproject) was split into 4-parts consisting of 200-250 Ctrees. - Then,
ami search
was used in each parts successfully, which created DataTables.
- Primarily in Windows
amisearch
created an empty_cooccurence
folder. - After debugging, AMI was updated which gave the desired result in
_cooccurence
folder. - Thus the error was rectified.
(Reference from Ambreen's update )
- Download VS code and clone the openVirus repository into your system.
- Open the
openVirus
folder in VS code (don't close it). - Now open your openVirus folder in your directory and make your changes in it.
- Reopen the VS code that was minimized. Now commit the changes by selecting the commit symbol. It might take time with respect to your size of uploading files.
- After adding the remote repository, push the changes to GitHub. See this video for other clarification.
NOTE : If already had cloned the repository, first pull the repo and then push the changes.