miniproject: viral epidemics and disease

What diseases co-occur with viral epidemics?

owner:

Priya

collaborators:

Dheeraj Kumar

miniproject summary

proposed activities

Use the communal corpus epidemic50noCov consisting of 50 articles. CREATED
Scrutinizing the 50 articles to know the true positives and false positives, that is, whether the articles are about viral epidemic or not. FINISHED
Using ami search to find whether the articles mentioned any comorbidity in a viral epidemic or not, annotating with dictionaries to create ami DataTables. FINISHED
Sectioning the articles using ami:section to extract the relevant information on comorbidity. FINISHED
Refining and rerunning the query to get a corpus of 950 articles. CREATED
Scrutinizing the 950 articles for true positives and false positives and creating a spreadsheet. PROGRESSING
Using ami search to create DataTables and ami section for sectioning the 950 articles. FINISHED
Using relevant ML technique for the classification of data whether the articles are based on viral epidemic and the diseases/disorders that co-occur. PROGRESSING
Creating a dashboard of knowledge, especially with an annotated map. NOT STARTED

outcomes

A spreadsheet will be developed based on the comorbidity during a viral epidemic and their count;

for 50 articles in epidemic50noCov. FINISHED
for 950 articles in disease corpus. PROGRESSING

Development of the ML model for data classification on accuracy. PROGRESSING
Annotated map with the obtained data. NOT STARTED

corpora `CREATED`

Initially the communal corpus epidemic50noCov will be used. (A small test corpus for using the large corpus disease)
Later a corpus of 950 articles created in disease corpus, using the syntax getpapers -q "viral epidemics AND human NOT COVID NOT corona virus NOT SARS-Cov-2" -o disease -f disease/log.txt -k 950 -x -p, will be used.

dictionaries

Disease (Details at https://github.com/petermr/openVirus/wiki/Dictionary:-Diseases)

software

getpapers to create the corpus of 950 articles from EuPMC.
AMI for creating DataTables, creating and using dictionaries, sectioning.
SPARQL for creating dictionaries.
KNIME for workflow and analytics.
keras for binary classification.

constraints

Respective pages

50 articles corpus epidemic50noCov at - https://github.com/petermr/openVirus/tree/master/miniproject/epidemic50noCov
for getpapers - https://github.com/petermr/openVirus/wiki/getpapers
for installing & updating ami - https://github.com/petermr/openVirus/wiki/Tools:-ami3
for ami search - https://github.com/petermr/openVirus/wiki/ami-search
for ami section - https://github.com/petermr/openVirus/wiki/ami:section
for SPARQL - https://github.com/petermr/openVirus/wiki/Tools-:-SPARQL
950 articles corpus disease at - https://github.com/petermr/openVirus/tree/master/miniproject/disease
for ML technique jupyter notebook is used - https://github.com/petermr/openVirus/wiki/Jupyter-Notebooks#data-preparation-for-ml

Initial Summary

(by collaborator Dheeraj)

The aim of the mini-project

What is our aim first of all, that if we recognize diseases, then we will be able to give medicines for it. In this mini project, we will be able to find diseases with the help of disease dictionary in times of "viral epidemic" by using ContentMine software ( getpapers and ami)

Resources

Dictionary

The names of all diseases are updated in the dictionary of diseases which are helpful in searching particular diseases' words in the articles, just like the dictionary contains a store of words.
It's source is ICD-10(by WHO) and Wikidata and it was created using ami.

Corpus 950

This is a group of articles which is based on viral epidemics and diseases. These articles contain information regarding diseases which are to be simplified.
This is a group of 950 articles that have been downloaded from EPMC via getpapers.

EPMC

This is a Pub Med Central website with a lot of scientific research knowledge articles. We are analyzing some of these articles for our mini-project, which are downloaded using getpapers.

Tools

getpapers

It is a ContentMine software capable of downloading large number of articles from Eupmc.
See https://github.com/petermr/openVirus/wiki/getpapers for using.

ami

It is also a ContentMine software. It is used in creating a dictionary. It is useful for searching particular diseases' words that are updated in dictionary, sectioning downloaded articles and gathering information from them.
Like in this, we have created a dictionary of disease.

Wikidata SPARQL

the online query service by wikidata it's include everything whatever you want.
In this mini project we should have ICD-10 code for Disease.
And wanted the result in different languages.
And we received the following result. (https://w.wiki/ZCr)

Work done

I have read about getpapers and EuPMC and also I have read about advanced search in EuPMC and Reading its articles too.
I read wikidata and learned to update the dictionary.
Also updated the Dictionary with the help of Wikidata Query with the ICD-10 codes.
So far I have manually classified some articles as True and False Positives.
Created a SPARQL query for multilingual disease dictionary.

My goal

As said that if diseases are known, then we can give medicines accordingly. Therefore, our main goal will be to find out the names of diseases that co-occur during viral epidemics and work accordingly.
Now have to manually classify all the articles into true positive and false positive.

Challenging

Learning KNIME for workflow and analytics.
Learning Keras and Python code in Jupyter Notebook to use in binary classification.

Issue Rectification

Splitting 950 corpus for `ami search`

The 950 article corpus was large in size and hence using ami search popped the OutOfMemoryError.
Hence, the disease corpus (Cproject) was split into 4-parts consisting of 200-250 Ctrees.
Then, ami search was used in each parts successfully, which created DataTables.

`_cooccurence` folder

Primarily in Windows amisearch created an empty _cooccurence folder.
After debugging, AMI was updated which gave the desired result in _cooccurence folder.
Thus the error was rectified.

Update

Uploading corpus to GitHub

(Reference from Ambreen's update )

Download VS code and clone the openVirus repository into your system.
Open the openVirus folder in VS code (don't close it).
Now open your openVirus folder in your directory and make your changes in it.
Reopen the VS code that was minimized. Now commit the changes by selecting the commit symbol. It might take time with respect to your size of uploading files.
After adding the remote repository, push the changes to GitHub. See this video for other clarification.

NOTE : If already had cloned the repository, first pull the repo and then push the changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

miniproject: viral epidemics and disease

What diseases co-occur with viral epidemics?

owner:

collaborators:

miniproject summary

proposed activities

outcomes

corpora `CREATED`

dictionaries

software

constraints

Respective pages

Initial Summary

The aim of the mini-project

Resources

Dictionary

Corpus 950

EPMC

Tools

getpapers

ami

Wikidata SPARQL

Work done

My goal

Challenging

Issue Rectification

Splitting 950 corpus for `ami search`

`_cooccurence` folder

Update

Uploading corpus to GitHub

Clone this wiki locally

miniproject: viral epidemics and disease

What diseases co-occur with viral epidemics?

owner:

collaborators:

miniproject summary

proposed activities

outcomes

corpora CREATED

dictionaries

software

constraints

Respective pages

Initial Summary

The aim of the mini-project

Resources

Dictionary

Corpus 950

EPMC

Tools

getpapers

ami

Wikidata SPARQL

Work done

My goal

Challenging

Issue Rectification

Splitting 950 corpus for ami search

_cooccurence folder

Update

Uploading corpus to GitHub

Clone this wiki locally

corpora `CREATED`

Splitting 950 corpus for `ami search`

`_cooccurence` folder