Welcome to my Health Data Science (HDS) Practice project! This repository is dedicated to documenting my journey as I learn and practice various concepts, tools, and techniques in the field of health data science.
The HDS_practice repository is my space for learning and practicing health data science. Throughout this journey, I will cover topics like data analysis, machine learning, and data visualization while applying these skills to health data. This project will include code examples, notes, and projects reflecting my growth.
This repository will cover work from the following courses, each focusing on different aspects of health data science:
- Statistical Foundations for Health Data Science (R): Biostatistics, probability theory, hypothesis testing, and statistical methods for health data.
- Computing for Health Data Science (Python): Python programming, including data cleaning, manipulation, and basic data science workflows.
- Management and Curation of Health Data (SAS): Health data storage, cleaning, management, and data curation using SAS.
- Data Structures and Algorithms (C): Implementation of data structures and algorithms relevant to health data.
- Context of Health Data Science (Notes): Notes and insights into the broader context and challenges of health data science.
- Health Data Analytics: Machine Learning (Python): Implementation of machine learning algorithms for health data.
- Health Data Analytics: Statistical Modelling (R): Statistical modeling approaches for health data, including linear and logistic regression.
- Database Systems (RDBMS): Relational databases, SQL, and database management for health data.
- Visualization and Communication of Health Data (R, Python): Best practices in visualizing health data for effective communication.
- Big Data Management (Hadoop): Handling large-scale health data using Hadoop for big data analytics.
In this project, I will use various tools and technologies, including:
- Programming languages: Python, R, SAS, C
- Data management: SAS, SQL, Hadoop, RDBMS
- Libraries and frameworks: pandas, NumPy, matplotlib, seaborn, scikit-learn, R packages
- Statistical methods: Hypothesis testing, regression models, biostatistics
- Machine learning: Supervised and unsupervised learning, model evaluation
- Data visualization: Effective communication of insights using R and Python
- Big data management: Hadoop and related technologies for managing large health datasets
The repository will be organized into the following sections:
data/
: Contains datasets used for practice and projects.notebooks/
: Jupyter Notebooks or RMarkdown files documenting analysis and projects.scripts/
: Python, R, or SAS scripts for specific analyses or tasks.projects/
: Larger projects applying a range of skills to real-world health data problems.notes/
: Course notes and reflections.
I will rely on various learning materials, including:
- Course materials from my health data science courses.
- Public health datasets from reputable sources.
- Online documentation and best practices for the tools and languages mentioned above.
This is a personal learning project. Contributions are not expected, but feel free to provide feedback or open issues if you have suggestions for improvement.
This project is licensed under the MIT License. See the LICENSE file for more information.