CSCI-347-Data-Mining: Advanced Data Science Techniques in Medical Diagnostics

Team Members

Philip Gehde
Moiyad Alfawwar

Overview

Welcome to our Data Science project repository for CSCI-347 Data Mining. This project focuses on applying various data science techniques and algorithms to medical diagnostic work, with a specific emphasis on heart disease prediction. Our work is inspired by the potential of machine learning to revolutionize medical diagnostics, making early detection and prevention of diseases like heart disease more accurate and accessible.

Dataset

We utilize the Statlog Heart Disease Dataset from the UCI machine learning repository. This dataset is widely recognized in the data science field for its applications in diverse research studies. It contains 270 instances with 13 attributes and no missing values, making it an ideal candidate for our analysis.

Attributes

The dataset features the following attributes, crucial for heart disease diagnosis:

Age
Sex (binary: male or female)
Chest Pain Type (4 values, label-encoded)
Resting Blood Pressure
Serum Cholesterol in mg/dl
Fasting Blood Sugar > 120 mg/dl (binary)
Resting Electrocardiographic Results (0,1,2)
Maximum Heart Rate Achieved
Exercise Induced Angina (binary)
Oldpeak (ST depression induced by exercise relative to rest)
The Slope of the Peak Exercise ST Segment
Number of Major Vessels (0-3) Colored by Flourosopy
Thal (3 = normal; 6 = fixed defect; 7 = reversible defect)

Project Structure

This repository contains Jupyter Notebooks covering the following key topics:

Graph Analysis: Exploration of data relationships and patterns.
Linear Transformation: Application of linear algebra techniques to optimize data representation.
K-Means Clustering: Unsupervised learning method to identify data clusters.
Additional notebooks will explore various data preprocessing, analysis, and machine learning techniques relevant to our project's goal.

Objectives

Our project aims to:

Evaluate and Clean the Dataset: Assess the quality, potential biases, and applicability of the dataset for heart disease diagnostics.
Apply Data Mining Techniques: Utilize various algorithms to uncover patterns and insights that could inform medical diagnostics.
Enhance Medical Diagnostic Work: Explore how machine learning can improve diagnostic accuracy, focusing on heart disease.

Motivation

Our personal experiences and observations in the medical field highlight the urgent need for improved diagnostics. This project is not just an academic exercise; it's a step toward leveraging data science for real-world medical advancements.

Usage

To get started with our notebooks:

Clone this repository to your local machine.
Ensure you have Jupyter Notebook installed, or use Google Colab for an online alternative.
Open the notebooks and follow the instructions within to replicate our analyses.

Dependencies

Python 3.x
Jupyter Notebook
Libraries: NumPy, pandas, matplotlib, scikit-learn, etc. (A full list of dependencies is available in the requirements.txt file.)

Contributing

We welcome contributions from the data science community. Whether it's improving the code, suggesting new analysis techniques, or discussing the implications of our findings, your input is valuable.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Our heartfelt gratitude goes to the researchers and contributors of the Statlog Heart Disease Dataset at the UCI Machine Learning Repository. Their work provides the foundation for our project and many others in the field of medical diagnostics.

Join us in this exploratory journey through data science to make a tangible impact on medical diagnostics. Together, we can push the boundaries of what's possible in healthcare through the power of data mining.

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
[1] - Intro to Statlog Dataset		[1] - Intro to Statlog Dataset
[2] - Graph Analysis		[2] - Graph Analysis
[3] - Linear transformation, k-means and dbscan		[3] - Linear transformation, k-means and dbscan
[4] - Summary Report		[4] - Summary Report
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSCI-347-Data-Mining: Advanced Data Science Techniques in Medical Diagnostics

Team Members

Overview

Dataset

Attributes

Project Structure

Objectives

Motivation

Usage

Dependencies

Contributing

License

Acknowledgments

About

Releases

Packages

Contributors 3

Languages

codephilip/Data-Mining

Folders and files

Latest commit

History

Repository files navigation

CSCI-347-Data-Mining: Advanced Data Science Techniques in Medical Diagnostics

Team Members

Overview

Dataset

Attributes

Project Structure

Objectives

Motivation

Usage

Dependencies

Contributing

License

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages