Machine learning and big data processing remains largely unexplored in health genomics and precision medicine. But recent studies have demonstrated than machine and deep learning applied to large clinical trials enables researchers and clinicians to create more accurate patient profiles, resulting in an improved diagnosis and treatments of diseases.
As part of my capstone project for The Data Incubator program (Winter 2020 cohort), I aimed to evaluate the relevance of data science tools in Health Genomics and publish my findings, scripts and interactive visualization tools. This work is relevant not only to other researchers but also to clinicians and the medical industry in general.
The project uses Machine Learning to predict
-
- The Predisposition to genetic diseases in humans using a simulated dataset
-
- Local adaptation of livestock using Cattle genetic data from Uganda
A summary of the project and its deliverables can be found here
- Genetic predisposition to a simulated disease in humans
- Environmental adaptation in Ugandan Cattle
- Other scripts can be found in the
Scripts/
folder, or directly through the Project Description page
- Datasets for human simulations and Ugandan Cattle can be found in
Data_Sim/
andData_UGBT/
respectively
- Compare current results with variant spark, a fast and scalable Random Forest Classifier.
- Compare current results with a Deep learning approaches. For example
- Interactive manhattan plots and maps of global distribution.