Skip to content

Latest commit

 

History

History
62 lines (47 loc) · 3.88 KB

README.md

File metadata and controls

62 lines (47 loc) · 3.88 KB

Data Analysis Program

The Data Analysis Program is a desktop application developed with PyQt5, Pandas, Seaborn, and Matplotlib to facilitate data exploration, analysis, and visualization. This program provides a user-friendly interface for loading datasets, exploring their structures, generating descriptive statistics, conducting correlation analysis, and more.

1

Features

Data Loading and Exploration

  • Load Datasets:

    • Seaborn Datasets: Choose a sample dataset from the Seaborn datasets library. 2 3
    • Local Datasets: Load a local dataset from your device by leaving the combo box as "None". 4
  • Inspect Dataset:

    • View the head and tail of the dataset. 5
    • Display data types of each column. 6
    • Show the sum of null values for each column and fill NA values. Options to ignore "0" values and encode boolean values. 7
    • View dataset's statistical values. 8
    • Get information for each column. 9

Data Editing for Machine Learning

  • Prepare Data:

    • Fill null values and encode non-numeric data types to make the dataset machine learning-ready.
  • Basic Machine Learning Model:

    • Create a basic RandomForest model and save it for future predictions. 10
    • Predict outcomes by typing in data or getting random data. Edit random data and predict. 17 18 19

Column Categorization

  • Categorize Columns:
    • Identify categorical and numerical columns, distinguishing between high-cardinality categorical columns and numerical columns treated as categorical.

Data Analysis

  • Target Variable Summary:

    • Generate summaries of the target variable grouped by other columns, with visualization options. 14 15
  • Correlation Analysis:

    • Perform correlation analysis on numerical columns and visualize the correlation matrix as a heatmap. 16
  • Column Summary:

    • Provide insights into categorical and numerical columns with count plots, histograms, and percentage summaries. 12 13