Skip to content

codephilip/Data-Mining

Repository files navigation

CSCI-347-Data-Mining: Advanced Data Science Techniques in Medical Diagnostics

Team Members

  • Philip Gehde
  • Moiyad Alfawwar

Overview

Welcome to our Data Science project repository for CSCI-347 Data Mining. This project focuses on applying various data science techniques and algorithms to medical diagnostic work, with a specific emphasis on heart disease prediction. Our work is inspired by the potential of machine learning to revolutionize medical diagnostics, making early detection and prevention of diseases like heart disease more accurate and accessible.

Dataset

We utilize the Statlog Heart Disease Dataset from the UCI machine learning repository. This dataset is widely recognized in the data science field for its applications in diverse research studies. It contains 270 instances with 13 attributes and no missing values, making it an ideal candidate for our analysis.

Attributes

The dataset features the following attributes, crucial for heart disease diagnosis:

  • Age
  • Sex (binary: male or female)
  • Chest Pain Type (4 values, label-encoded)
  • Resting Blood Pressure
  • Serum Cholesterol in mg/dl
  • Fasting Blood Sugar > 120 mg/dl (binary)
  • Resting Electrocardiographic Results (0,1,2)
  • Maximum Heart Rate Achieved
  • Exercise Induced Angina (binary)
  • Oldpeak (ST depression induced by exercise relative to rest)
  • The Slope of the Peak Exercise ST Segment
  • Number of Major Vessels (0-3) Colored by Flourosopy
  • Thal (3 = normal; 6 = fixed defect; 7 = reversible defect)

Project Structure

This repository contains Jupyter Notebooks covering the following key topics:

  • Graph Analysis: Exploration of data relationships and patterns.
  • Linear Transformation: Application of linear algebra techniques to optimize data representation.
  • K-Means Clustering: Unsupervised learning method to identify data clusters.
  • Additional notebooks will explore various data preprocessing, analysis, and machine learning techniques relevant to our project's goal.

Objectives

Our project aims to:

  1. Evaluate and Clean the Dataset: Assess the quality, potential biases, and applicability of the dataset for heart disease diagnostics.
  2. Apply Data Mining Techniques: Utilize various algorithms to uncover patterns and insights that could inform medical diagnostics.
  3. Enhance Medical Diagnostic Work: Explore how machine learning can improve diagnostic accuracy, focusing on heart disease.

Motivation

Our personal experiences and observations in the medical field highlight the urgent need for improved diagnostics. This project is not just an academic exercise; it's a step toward leveraging data science for real-world medical advancements.

Usage

To get started with our notebooks:

  1. Clone this repository to your local machine.
  2. Ensure you have Jupyter Notebook installed, or use Google Colab for an online alternative.
  3. Open the notebooks and follow the instructions within to replicate our analyses.

Dependencies

  • Python 3.x
  • Jupyter Notebook
  • Libraries: NumPy, pandas, matplotlib, scikit-learn, etc. (A full list of dependencies is available in the requirements.txt file.)

Contributing

We welcome contributions from the data science community. Whether it's improving the code, suggesting new analysis techniques, or discussing the implications of our findings, your input is valuable.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Our heartfelt gratitude goes to the researchers and contributors of the Statlog Heart Disease Dataset at the UCI Machine Learning Repository. Their work provides the foundation for our project and many others in the field of medical diagnostics.


Join us in this exploratory journey through data science to make a tangible impact on medical diagnostics. Together, we can push the boundaries of what's possible in healthcare through the power of data mining.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published