This is an application to get Health Data samples for AutoML and another types of applications.
According to the WHO, The top global causes of death, in order of total number of lives lost, are associated with three broad topics (source: https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death):
- Cardiovascular (ischaemic heart disease, stroke),
- Respiratory (chronic obstructive pulmonary disease, lower respiratory infections) and
- Neonatal conditions – which include birth asphyxia and birth trauma, neonatal sepsis and infections, and preterm birth complications.
This application provides real data (without personal data) for some of these top 10 scenarios of diseases identified by WHO. The datasets for this application are:
- Diabetes dataset: data to predict diabetes diagnosis
- Heart Disease dataset: data to predict heart disease
- Kidney Disease dataset: data to predict kidney disease
- Breast Cancer dataset: data to predict breast cancer
- Maternal Risk dataset: data to predict maternal risk level
- Hospital Mortality dataset: data to predict hospital mortality
- World Life Expectancy dataset: data to predict life expectancy based in the country social and health indicators
- Pollution Deaths from fossil fuels dataset: data to predict deaths caused fossil fuels pollution
- Dementia dataset: data to predict dementia
- Hepatitis dataset: data to predict death risk caused hepatitis symptoms evolution
- Clone/git pull the repo into any local directory
$ git clone https://github.com/yurimarx/automl-heart.git
- Open a Docker terminal in this directory and run:
$ docker-compose build
- Run the IRIS container:
$ docker-compose up -d
- Do a Select to the HeartDisease dataset:
SELECT
age, bp, chestPainType, cholesterol, ekgResults, exerciseAngina, fbsOver120, heartDisease, maxHr, numberOfVesselsFluro, sex, slopeOfSt, stDepression, thallium
FROM dc_data_health.HeartDisease
- Do a Select to the Kidney Disease dataset:
SELECT
age, al, ane, appet, ba, bgr, bp, bu, cad, classification, dm, hemo, htn, pc, pcc, pcv, pe, pot, rbc, rc, sc, sg, sod, su, wc
FROM dc_data_health.KidneyDisease
- Do a Select to the Diabetes dataset:
SELECT
Outcome, age, bloodpressure, bmi, diabetespedigree, glucose, insulin, pregnancies, skinthickness
FROM dc_data_health.Diabetes
- Do a Select to the Breast Cancer dataset:
SELECT
areamean, arease, areaworst, compactnessmean, compactnessse, compactnessworst, concavepointsmean, concavepointsse, concavepointsworst, concavitymean, concavityse, concavityworst, diagnosis, fractaldimensionmean, fractaldimensionse, fractaldimensionworst, perimetermean, perimeterse, perimeterworst, radiusmean, radiusse, radiusworst, smoothnessmean, smoothnessse, smoothnessworst, symmetrymean, symmetryse, symmetryworst, texturemean, texturese, textureworst
FROM dc_data_health.BreastCancer
- Do a Select to the Maternal Health Risk dataset:
SELECT
BS, BodyTemp, DiastolicBP, HeartRate, RiskLevel, SystolicBP, age
FROM dc_data_health.MaternalHealthRisk
- Do a Select to the Hospital Mortality dataset:
SELECT
age, aniongap, atrialfibrillation, basophils, bicarbote, bloodcalcium, bloodpotassium, bloodsodium, bmi, chdwithnomi, chloride, copd, creatinekise, creatinine, deficiencyanemias, depression, diabetes, diastolicbloodpressure, ef, gendera, glucose, "group", heartrate, hematocrit, hyperlipemia, hypertensive, inr, lacticaacid, leucocyte, lymphocyte, magnesiumion, mch, mchc, mcv, neutrophils, ntprobnp, outcome, pco2, ph, platelets, pt, rbc, rdw, relfailure, respiratoryrate, spo2, systolicbloodpressure, temperature, ureanitrogen, urineoutput
FROM dc_data_health.HospitalMortality
- Do a Select to the Life Expectancy dataset:
SELECT
AdultMortality, Alcohol, BMI, Country, Diphtheria, GDP, HIVAIDS, HepatitisB, IncomeCompositionOfResources, InfantDeaths, LifeExpectancy, Measles, PercentageExpenditure, Polio, Population, Schooling, Status, Thinness1To19Years, Thinness5To9Years, TotalExpenditure, UnderFiveDeaths, Year
FROM dc_data_health.LifeExpectancy
- Do a Select to the Pollution Deaths dataset:
SELECT
Country, CountryCode, DeathYear, ExcessMortality
FROM dc_data_health.PollutionDeaths
- Do a Select to the Dementia dataset:
SELECT
ASF, Age, CDR, EDUC, Genre, Hand, MMSE, MRDelay, Outcome, SES, Visit, eTIV, nWBV
FROM dc_data_health.Dementia
- Do a Select to the Hepatitis Death risk dataset:
SELECT
age, albumin, alkphosphate, anorexia, antivirals, ascites, bilirubin, fatigue, histology, liverbig, liverfirm, malaise, outcome, protime, sex, sgot, spiders, spleenpalpable, steroid, varices
FROM dc_data_health.Hepatitis
It's packaged with ZPM so it could be installed as:
zpm "install dataset-health"
- MIT License for this Application
- CC BY-NC-SA 4.0 License for the Breast Cancer Dataset
- Original Source: https://www.kaggle.com/uciml/breast-cancer-wisconsin-data
- File into the app: /opt/irisapp/data/breast-cancer.csv
- Persistent Class: dc.data.health.BreastCancer
- CC0: Public Domain for Diabetes Dataset
- Original Source: https://www.kaggle.com/mathchi/diabetes-data-set
- File into the app: /opt/irisapp/data/diabetes.csv
- Persistent Class: dc.data.health.Diabetes
- CC0: Public Domain for Heart Disease
- Original Source: https://data.world/informatics-edu/heart-disease-prediction
- File into the app: /opt/irisapp/data/heart-disease.csv
- Persistent Class: dc.data.health.HeartDisease
- CC0: Public Domain for Maternal Health Risk
- Original Source: https://www.kaggle.com/yasserhessein/classification-maternal-health-5-algorithms-ml/data
- File into the app: /opt/irisapp/data/maternal_health_risk.csv
- Persistent Class: dc.data.health.MaternalHealthRisk
- CC0: Public Domain for World Life Expectancy
- Original Source: https://www.kaggle.com/kumarajarshi/life-expectancy-who - The data was collected from WHO and United Nations website with the help of Deeksha Russell and Duan Wang.
- File into the app: /opt/irisapp/data/life_expectancy.csv
- Persistent Class: dc.data.health.LifeExpectancy
- CC0 1.0 Universal (CC0 1.0) Public Domain Dedication for Hospital Mortality
- Original Source: https://www.kaggle.com/saurabhshahane/in-hospital-mortality-prediction (Zhou, Jingmin et al. (2021), Prediction model of in-hospital mortality in intensive care unit patients with heart failure: machine learning-based, retrospective analysis of the MIMIC-III database, Dryad, Dataset, https://doi.org/10.5061/dryad.0p2ngf1zd)
- File into the app: /opt/irisapp/data/hospital_mortality.csv
- Persistent Class: dc.data.health.HospitalMortality
- CC0 1.0 Universal (CC0 1.0) Public Domain for Pollution Deaths dataset
- Original Source: https://www.kaggle.com/mathurinache/pollution-deaths
- File into the app: /opt/irisapp/data/pollution-deaths-from-fossil-fuels.csv
- Persistent Class: dc.data.health.PollutionDeaths
- Attribution-NonCommercial-ShareAlike 3.0 IGO (CC BY-NC-SA 3.0 IGO) for Dementia dataset
- Original Source: https://www.kaggle.com/shashwatwork/dementia-prediction-dataset
- File into the app: /opt/irisapp/data/dementia.csv
- Persistent Class: dc.data.health.Dementia
- CC0 1.0 Universal (CC0 1.0) Public Domain for Hepatitis Death Risk dataset
- Original Source: https://www.kaggle.com/codebreaker619/hepatitis-data
- File into the app: /opt/irisapp/data/hepatitis.csv
- Persistent Class: dc.data.health.Hepatitis
- CC0: Public Domain for Kidney Disease
- Original Source:
- @misc{Dua:2019 ,
- author = "Dua, Dheeru and Graff, Casey",
- year = "2017",
- title = "{UCI} Machine Learning Repository",
- url = "http://archive.ics.uci.edu/ml",
- institution = "University of California, Irvine, School of Information and Computer Sciences" }
- File into the app: /opt/irisapp/data/kidney_disease.csv
- Persistent Class: dc.data.health.KidneyDisease
- Original Source: