miniproject: viral epidemics and disease

Jump to bottom Edit New page

Lakshmi Devi Priya edited this page Jun 30, 2020 · 34 revisions

What diseases co-occur with viral epidemics?

owner:

Priya

collaborators:

miniproject summary

proposed activities

Use the communal corpus epidemic50noCov articles.
Scrutinizing the 50 articles to know the true positives and false positives, that is, whether the articles are about viral epidemic or not.
Using ami search to find whether the articles mentioned any comorbidity in a viral epidemic or not.
Sectioning the articles using ami:section to extract the relevant information on comorbidity. Annotating with dictionaries to create ami DataTables.
Refining and rerunning the query to get a corpus of 950 articles.
Using relevant ML technique for the classification of data whether the articles are based on viral epidemic and the diseases/disorders that co-occur.

outcomes

A spreadsheet as well as a graph will be developed based on the comorbidity during a viral epidemic and their count.
Development of the ML model for data classification on accuracy.

corpora

Initially the communal corpus epidemic50noCov will be used.
Later a corpus of 950 articles will be created.

dictionaries

Disease (https://github.com/petermr/openVirus/tree/master/dictionaries/diseases)

software

AMI for creating and using dictionaries, sectioning.
SPARQL for creating dictionaries.
KNIME for workflow and analytics.
Python and relevant libraries (keras) for ML and data visualization.

constraints

Progress done

The 50 articles in communal corpus epidemic50noCov were binary classified manually and a spreadsheet was developed.
The corpus was sectioned using ami section using reference from https://github.com/petermr/openVirus/wiki/ami:section.
getpapers was used to create a corpus of 950 articles regarding human viral epidemics(expect COVID-19) by the syntax getpapers -q "viral epidemics AND human NOT COVID NOT corona virus NOT SARS-Cov-2" -o disease_mp -f ve/log.txt -k 955 -x -p. XML -954 files & PDF -913 files were created.