This repository contains comprehensive notes and code written during my journey to learn Data Science and AI. It is organized into various sections covering essential topics and concepts, providing a valuable resource for anyone interested in mastering Data Science.
- Introduction
- Python Basics
- Data Analysis
- Data Visualization
- Machine Learning
- Deep Learning
- Natural Language Processing (NLP)
- Statistics
- Feature Engineering and Exploratory Data Analysis (EDA)
- Resources
Welcome to my Data Science repository! This collection includes all the notes and code I have accumulated while learning Data Science. The purpose of this repository is to serve as a reference for myself and others interested in this field.
This section covers the fundamental concepts of Python programming necessary for data science, including:
- Variables and Data Types
- Control Structures
- Functions
- Libraries: NumPy, Pandas
- Modules and Packages
- File Handling
- Multi-processing and Multi-threading
- Object-Oriented Programming (OOP)
- MongoDB
- Web Scraping
In this section, you will find notes and code related to data analysis, including:
- Data Cleaning
- Data Manipulation
- Exploratory Data Analysis (EDA)
This section includes techniques and code for data visualization using Python libraries such as:
- Matplotlib
- Seaborn
- Plotly
This section covers various machine learning algorithms and their implementation, including:
- Supervised Learning
- Linear Regression
- Logistic Regression
- Decision Trees
- Support Vector Machines (SVM)
- Naive Bayes
- K-Nearest Neighbors (K-NN)
- Unsupervised Learning
- K-Means Clustering
- DBSCAN
- Hierarchical Clustering
- Ensemble Techniques
- Bagging
- Random Forest
- Boosting (AdaBoost, Gradient Boosting, XGBoost, LightGBM, CatBoost)
- Stacking
- Dimensionality Reduction
- Principal Component Analysis (PCA)
- Time Series Analysis
This section delves into deep learning concepts and their practical applications using frameworks like TensorFlow and Keras, including:
- Artificial Neural Networks (ANNs)
- Activation Functions
- Forward and Backward Propagation
- Implementing ANN with Keras
- Optimization Techniques
- Convolutional Neural Networks (CNNs)
- Pooling, Padding, and various CNN architectures (VGG, LeNet, AlexNet, Inception, ResNet)
- Transfer Learning
- Recurrent Neural Networks (RNNs)
- LSTM, GRU
- Generative Adversarial Networks (GANs)
- Object Detection (YOLO, Custom Models)
This section includes notes and code for various NLP tasks, including:
- Text Preprocessing
- Text Representation (Word Embeddings: Word2Vec, Doc2Vec)
- Text Classification
- Sequence Models (RNN, LSTM, GRU)
- Transformers
- Text Generation
- Named Entity Recognition (NER)
- Sentiment Analysis
In this section, you will find comprehensive notes and materials on statistics, including:
- Descriptive Statistics
- Measures of Central Tendency
- Measures of Dispersion
- Probability Distributions
- Normal Distribution, Binomial Distribution, Poisson Distribution
- Inferential Statistics
- Hypothesis Testing
- Z-test, T-test, Chi-Square Test, ANOVA
- Confidence Intervals
This section covers techniques for handling and preprocessing data, including:
- Handling Missing Values
- Handling Imbalanced Datasets
- Data Interpolation
- Handling Outliers
- Feature Scaling
- Feature Extraction
- Data Encoding
- Covariance and Correlation Analysis
- Various EDA Projects
Here you will find a list of resources, including books, tutorials, and articles that have been instrumental in my learning journey.
Krish Naik , Nitesh (campusX) mainly and few other
Note: This repository is a work in progress and will be updated continuously as I learn more about data science.