Skip to content

This program provides data loading, exploration, and analysis tools, including descriptive statistics, column categorization, target variable summaries, and correlation analysis. With error handling, it enables seamless exploration and insights extraction from datasets.

License

Notifications You must be signed in to change notification settings

anlbora/Seaborn-Data-Analysis

Repository files navigation

Data Analysis Program

The Data Analysis Program is a desktop application developed with PyQt5, Pandas, Seaborn, and Matplotlib to facilitate data exploration, analysis, and visualization. This program provides a user-friendly interface for loading datasets, exploring their structures, generating descriptive statistics, conducting correlation analysis, and more.

1

Features

Data Loading and Exploration

  • Load Datasets:

    • Seaborn Datasets: Choose a sample dataset from the Seaborn datasets library. 2 3
    • Local Datasets: Load a local dataset from your device by leaving the combo box as "None". 4
  • Inspect Dataset:

    • View the head and tail of the dataset. 5
    • Display data types of each column. 6
    • Show the sum of null values for each column and fill NA values. Options to ignore "0" values and encode boolean values. 7
    • View dataset's statistical values. 8
    • Get information for each column. 9

Data Editing for Machine Learning

  • Prepare Data:

    • Fill null values and encode non-numeric data types to make the dataset machine learning-ready.
  • Basic Machine Learning Model:

    • Create a basic RandomForest model and save it for future predictions. 10
    • Predict outcomes by typing in data or getting random data. Edit random data and predict. 17 18 19

Column Categorization

  • Categorize Columns:
    • Identify categorical and numerical columns, distinguishing between high-cardinality categorical columns and numerical columns treated as categorical.

Data Analysis

  • Target Variable Summary:

    • Generate summaries of the target variable grouped by other columns, with visualization options. 14 15
  • Correlation Analysis:

    • Perform correlation analysis on numerical columns and visualize the correlation matrix as a heatmap. 16
  • Column Summary:

    • Provide insights into categorical and numerical columns with count plots, histograms, and percentage summaries. 12 13

About

This program provides data loading, exploration, and analysis tools, including descriptive statistics, column categorization, target variable summaries, and correlation analysis. With error handling, it enables seamless exploration and insights extraction from datasets.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages