Roadmap to becoming an Industry Leading Data Scientist.
After multiple hours of research with a number of industry professionals currently working as Data Scientists in Fortune 500 tech companies, we've aggregated all the best, free and open source learning material that they used to become a Data Scientist.
The roadmap has been carefully and accurately broken down into a step by step action plan of theory and practice resources to make sure you never miss any aspect in your Data Science learning.
Things to keep in mind while going through
-
The courses that we describe are available for free to audit and need not be bought. This document is in no way any paid promotion of the described courses, we recommend them based on community feedback and experience. Similar applies to the links/exercises we follow.
-
The Week Structure we illustrate may not be in complete balance with the candidate's timeline. Therefore, in the cases like these, we advise considering the stipulated time frame 1 week/topic to be more than 1 (maybe 2 weeks/topic) as we strongly advise to not break the structure of course action.
Suggested Course Link
Topic | Topic/Tutorial | Exercises |
---|---|---|
SQL | Tutorial (ER Diagrams towards the end of the video are optional) | Exercise 1 Exercise 2 |
Week 3: Databases and SQL for Data Science with Python | String Pattern, Ranges and Operations on Sets Video | Sorting & Grouping Problem |
Functions, Multiple Tables and Sub-Queries Video | Course Lab 1 Course Lab 2 Course Lab 3 | |
Week 4: Database and SQL for Data Science with Python | Methods and Tools to access Database with Python | Quiz Course Lab 1 Course Lab2 |
Week 5: Database and SQL for Data Science with Python | Hands-on SQL Experience with Real-world Data | Assignment |
SQL Project View | Take a Look at the Project Level Implementation of SQL | College-ERP ansql |
Topic | Course/Tutorial | Exercises |
---|---|---|
Excel | Tutorial 1 Tutorial 2 |
Excel Interview Questions |
Excel Advanced(Preferred if your Job Description requires sound knowledge of Excel) | Tutorial 1 Tutorial 2 |
Excel VBA for Creative problem solving |
Project to follow after completing Excel:
Python Environment Setup
Tutorial to follow
Coding Questions
Topic | Practice Resource | Coding Question |
---|---|---|
Variables | Quiz 1 Quiz 2 Quiz 3 Quiz 4 | Practice from Hackerrank given above |
Conditional Statements | Quiz 1 Quiz 2 (Solve Conditional Statements Questions only) | Practice from Hackerrank given above |
Functions | Quiz 1 Quiz 2 Quiz 3 Quiz 4 Quiz 5 Quiz 6 Quiz 7 | Practice from Hackerrank given above |
Control Flow | Quiz 1 Quiz 2 Quiz 3 Quiz 4 Quiz 5 Quiz 6 Quiz 7 | Practice from Hackerrank given above |
Bitwise Operators | Quiz 1 Quiz 2 Quiz 3 | Practice from Hackerrank given above |
Strings | Quiz 1 Quiz 2 Quiz 3 Quiz 4 Quiz 5 Quiz 6 Quiz 7 Quiz 8 Quiz 9 Quiz 10 Quiz 11 Quiz 12 Quiz 13 Quiz 14 | Practice from Hackerrank given above |
List, Tuples | Quiz 1 Quiz 2 Quiz 3 Quiz 4 Quiz 5 Quiz 6 Quiz 7 Quiz 8 Quiz 9 Quiz 10 Quiz 11 Quiz 12 | Practice from Hackerrank given above |
Dictionary | Quiz 1 Quiz 2 Quiz 3 Quiz 4 Quiz 5 | Practice from Hackerrank given above |
Sets | Quiz 1 Quiz 2 Quiz 3 Quiz 3 Quiz 4 Quiz 5 | Practice from Hackerrank given above |
Tutorial to Follow
Topic | Practice Resource | Coding Question |
---|---|---|
Classes and Objects | Quiz 1 Quiz 2 | Python Classes and Objects |
Attributes & Constructors | Quiz | Python Class Attributes Class Instance Attributes Constructors in Python |
Inheritance | Quiz 1 Quiz 2 Quiz 3 | Inheritance example Python Inheritance Questions & Answers |
Overloading | Quiz 1 Quiz 2 | Method Overloading Overloading in Python |
Overriding | Quiz 1 | Method Overriding in Python |
Polymorphism | Quiz 1 | Theory 1 Theory 2 |
Data hiding | Quiz 1 Quiz 2 | Data hiding and object printing |
Regular Expression | Quiz 1 Quiz 2 Quiz 3 | Regex Coding Problems |
Tutorial to Follow Link
Coding Questions
Topic | Practice Resource | Coding Question |
---|---|---|
Time Complexity | Quiz 1 Quiz 2 Quiz 3 | Theory |
Recursion | Quiz 1 Quiz 2 | Theory |
Linked List | Practice Problems | Practice from Hackerrank given above Additional Problems |
Stacks and Queues | Quiz 1 | Practice from Hackerrank given above Additional Problems |
Hashing | Quiz | Practice from Hackerrank given above |
Searching Algorithms | Quiz | Practice from Hackerrank given above Additional Problems |
Sorting Algorithms | Quiz |
Topic | Course/Tutorial | Exercises |
---|---|---|
NumPy | Tutorial | Exercise 1 Exercise 2 Quiz |
Pandas | Tutorial | Exercise 1 Exercise 2 Quiz Problems |
Matplotlib | Tutorial | Quiz |
Topic | Practice Resource | Coding Question |
---|---|---|
Statistics | Course | Statistics Interview Questions |
Probability Advanced | Theory | 40 Probability questions |
Topic | Learning Resource | Coding Question |
---|---|---|
Probability Miscellaneous | Theory | |
Linear Algebra | Theory | Question |
Multivariate Calculus(Skip Simple Neural Network and Simple Artificial Neural Network from Week 3) | Theory | |
PCA | Theory |
Andrew NG Stanford Machine Learning Coursera Link
For this Course, all Exercises in Python: Link
Course Section | Topic | Practice Codes | Additional Resource |
---|---|---|---|
Week 1: Andrew NG Stanford Machine Learning | What is Machine Learning | ||
Supervised Learning | |||
Unsupervised Learning | |||
Linear Regression with One Variable Cost Function Gradient Descent |
Linear Regression without Library in Python One variable Linear Regression One variable Linear Regressions (Advanced) |
Learn EDA before coding EDA 1(EDA Methods Code) EDA 2(EDA Project) Data Analysis |
|
Linear Algebra Review Vector Arithmetic Operations |
Do a quick review as you have done a detailed one initially | ||
Week 2: Andrew NG Stanford Machine Learning | Linear Regression with multiple variables Multiple Features Gradient Descent for Multiple Variables Polynomial Regression Normal Equation Non-Invertibility |
ML without libraries ML using libraries Project |
Course Section | Topic | Practice Codes |
---|---|---|
Week 3: Andrew NG Stanford Machine Learning | Logistic Regression: Classification Hypothesis Representation Decision Boundary Cost Function Optimization |
Logistic Regression Practice from scratch Logistic Regression using Library Logistic Practice examples |
Regularization Overfitting Regularised Linear Regression Regularised Logistic Regression |
Regularization without library |
Summary Exercise:
Based on what you have Learned, conduct the following project: Link
Note: Use the links below for a structured approach to a project.
Course Section | Topic | Practice Codes |
---|---|---|
Week 7: Andrew NG Stanford Machine Learning | Support Vector Machines Optimization Large Margin Intuition Underlying Mathematics Kernels |
SVM using Library 1 SVM using Library 2 SVM without Library |
Applied Data Science Course Link
Course Section | Topic | Practice Codes |
---|---|---|
Week 1: Applied Data Science Course | Intro to SciKit Learn | Environment Setup Code Tutorial Project |
Week 1: Applied Data Science Course | K Nearest Neighbors | KNN from scratch KNN using SKLearn UCI Glass detection |
Week 2: Applied Data Science Course | Introduction to Supervised Learning Overfitting and Underfitting Supervised Learning: Datasets K Nearest Neighbours |
|
Week 2: Applied Data Science Course | Linear Regression: Least Squares | Boston Housing Problem - Linear Regression Simple Linear Regression |
Week 2: Applied Data Science Course | Linear Regression: Lasso | Lasso Regression without library Lasso Regression with library |
Week 2: Applied Data Science Course | Linear Regression: Polynomial | Polynomial Regression without Library |
Course Section | Topic | Practice Codes |
---|---|---|
Week 2: Applied Data Science Course | Logistic Regression | Since it's already covered, do a quick review here. |
Week 2: Applied Data Science Course | Support Vector Machines | SVM using Library SVM with Python SVM without library |
Week 2: Applied Data Science Course | Decision Trees | Naive Bayes without library Decision tree without library |
CS229 Stanford | Ensemble Methods | Theory |
Week 4: Applied Data Science Course | Naive Bayes | Naive Bayes Sklearn Classifiers |
Course Section | Topic | Practice Codes |
---|---|---|
Week 4: Applied Data Science Course | Random Forests | Random forest without library Random forest Classifier |
Week 4: Applied Data Science Course | Dimensionality Reduction and Manifold Learning | Good Read Dimensionality reduction and classification on Hyperspectral images |
IBM's Machine Learning with python
For this Course, all Exercises in Python:
Machine Learning exercise with Python (IBM)
Course Section | Topic | Practice Codes |
---|---|---|
Week 1: Machine Learning With Python | Python for Machine Learning Intro | Course Practice Problems |
Supervised vs Unsupervised | Course Practice Problems | |
Week 2: Machine Learning With Python | Linear Regression Model Evaluation Evaluation Metrics |
Course Practice Problems |
Non-Linear Regression | Course Practice Problems | |
Week 3: Machine Learning With Python | K-Nearest Neighbours Intro to Classification KNN Evaluation Metrics |
Course Practice Problems |
Decision Trees Building Decision Trees |
Course Practice Problems | |
Logistic Regression Logistic vs Linear Regression Logistic Regression training |
Course Practice Problems | |
Support Vector Machine | Course Practice Problems | |
Week 4: Machine Learning With Python | k-Means Clustering | Course Practice Problems |
Hierarchical Clustering | Course Practice Problems | |
Density-Based Clustering | Course Practice Problems | |
Week 5: Machine Learning With Python | Content based Recommendation Engines Recommender Systems Collaborative Filtering |
Course Practice Problems |
Week 6: Machine Learning With Python | Course Project |
Continuing the course Andrew NG Stanford Machine Learning Coursera Link
Course Section | Topic | Practice Codes |
---|---|---|
Week 4: Andrew NG Stanford Machine Learning | Neural Networks: Representation Non-Linear Hypothesis Model Representation Intuitions |
Getting used to Neural nets |
Week 5: Andrew NG Stanford Machine Learning | Neural Networks: Learning Cost Function Backpropagation Gradient Checking Random Initialization Autonomous Driving |
Deep Learning for Beginners |
Week 6: Andrew NG Stanford Machine Learning | Machine Learning System Design Evaluating Hypothesis Model Selection Bias vs. Variance Regularization Learning Curves |
Machine Learning Systems design |
Andrew NG Stanford Machine Learning Coursera Link
Course Section | Topic | Practice Codes |
---|---|---|
Week 8: Andrew NG Stanford Machine Learning | Unsupervised Learning Introduction K-means Optimization Objective Initialization Picking Clusters |
Handson unsupervised learning Unsupervised Learning Unsupervised learning without libraries |
Dimensionality Reduction Data Compression Visualization PCA |
Dataset Dimensionality reduction in Python Complete Project |
|
Week 9: Andrew NG Stanford Machine Learning | Anomaly Detection Gaussian Distribution Anomaly Detection vs Supervised Learning- Choosing Features- Multivariate Gaussian |
Anomaly Detection |
Recommender Systems Content Based Recommendations Collaborative Filtering Vectorization |
Amazon Product Recommender System | |
Week 10: Andrew NG Stanford Machine Learning | Refer to course Videos | |
Week 11: Andrew NG Stanford Machine Learning | Refer to course Videos |
Continue with Deeplearning.ai specialization
Course Section | Topic | Practice Codes |
---|---|---|
Week 1: Neural Networks and Deep Learning | What are neural networks and deep learning? Supervised Learning with Neural Networks |
Course Practice Problem |
Week 2: Neural Networks and Deep Learning | Binary Classification Logistic Regression Cost Function Gradient Descent Derivatives Computation Graph |
Course Practice Problem |
Week 2: Neural Networks and Deep Learning | Vectorization Broadcasting in Python Getting Started with Jupyter Notebook |
Course Practice Problem |
Week 3: Neural Networks and Deep Learning | Neural network representation Vectorising Activation Functions Non Linear Activation Functions Derivation of Activation Functions Gradient Descent for Neural Networks Backpropagation Random Initialization |
Course Practice Problem |
Week 4: Neural Networks and Deep Learning | Forward Propagation Deep PresentationsParameters vs Hyperparameters |
Course Practice Problem |
Continue with Deeplearning.ai specialization
Course Section | Topic | Practice Codes |
---|---|---|
Week 1: Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization | Train/Dev/Test Bias/Variance |
Course Practice Problem 1 Course Practice Problem 2 |
Week 1: Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization | Regularization Importance of Regularization Dropout Normalizing Input Vanishing/Exploding Gradients Weight Initialization Gradient Checking |
Course Practice Problem |
Week 2: Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization | Mini Batch Gradient Descent Bias Correction Momentum RMSProp |
Course Practice Problem |
Week 2: Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization | Adam optimization Learning Rate Decay Local Optima |
Course Practice Problem |
Week 3: Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization | Hyperparameter tuning | Course Practice Problem 1 Course Practice Problem 2 |
Week 3: Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization | Tuning Picking Hyperparammeters Normalising Activators Fitting Batch Norm- Softmax |
Course Practice Problem |
Continue with Deeplearning.ai specialization
Course Section | Topic | Practice Codes |
---|---|---|
Week 1: Structuring Machine Learning Projects | Orthogonalisation Satisficing & Optimising Metric Varying Dev/Test sets & Metrics Avoidable Bias |
Course Practice Problem |
Week 2: Structuring Machine Learning Projects | Error Analysis | Course Practice Problem |
Week 2: Structuring Machine Learning Projects | Mismatched training and dev/test set | Course Practice Problem |
Week 2: Structuring Machine Learning Projects | Learning from Multiple Tasks | Course Practice Problem |
Week 2: Structuring Machine Learning Projects | End to end deep learning | Course Practice Problem |
Topic | Tutorial |
---|---|
Getting Started with Keras Modules | Video |
Keras Implementation Example (Classification Boilerplate Code) | Video Code |
Step By Step Making model in Keras | First Neural Network in Keras |
Binary Classification In Keras | Code using NN |
Logistic Regression in Keras | Code |
End to end deep learning | Video |
Neural Networks | Code Tutorial |
Regularization Keras: L1/L2 | Theory |
Regularization Keras: DropOut | Theory |
Weight Init | Theory |
Optimizer Keras | Theory |
Learning Rate Decay | Theory |
Convolutional Network | Theory |
The concepts shown below are covered in the previous sections of
- Neural Networks and Deep Learning
- Improving Deep Neural Networks
- Hyperparameter Tuning
- Regularization and Optimization
Continue with Deeplearning.ai specialization Link
Course Section | Topic | Practice Codes |
---|---|---|
Week 1: Convolutional Neural Networks | Convolutional Neural Networks Foundation Edge Detection Padding Stride Convolution over volume Pooling |
Course Practice Problem 1 Course Practice Problem 2 |
Week 2: Convolutional Neural Networks | Case Studies using ConvNets Resnets Inception Mobile Net Efficient Net Transfer Learning Data Augmentation |
Course Practice Problem |
Week 3: Convolutional Neural Networks | Object Localization Landmark Detection Object Detection Sliding Windows Bounding Box IoU Non max suppression Anchor Box YOLO Region Proposal Semantic Segmentation Transpose Convolution U-Net |
Course Practice Problem |
Week 4: Convolutional Neural Networks | Face recognition & Neural style transfer Face Recognition One Shot Learning Siameses Net Triplet Loss Face Verification & Binary Classification Neural Style Transfer Deep ConvNets Learning Cost Function Content Cost Function Style Cost Function 1D & 3D Generalisations |
Course Practice Problem 1 Course Practice Problem 2 |
Continue with Deeplearning.ai specialization Link
Course Section | Topic | Practice Codes |
---|---|---|
Week 1:Sequence Models | Recurrent Neural Networks Backpropagation through time Types of RNN Language Model & Sequence Generation Novel Sequences Vanishing Gradients with RNN, GRU, LSTM, Bidirectional RNN, Deep RNN |
Course Practice Problem |
Week 2:Sequence Models | Word Representation Word Embeddings Embedding Matrix Word2Vec Negative Sampling GloVe Sentiment Classification Debiasing word embeddings |
Course Practice Problem |
Week 3:Sequence Models | Various sequence to sequence architectures Basic Models Picking Most Likely Sentence Beam search Error Analysis in Beam Search Bleu Score Attention Model |
Course Practice Problem |
Week 3:Sequence Models | Speech Recognition Trigger Word Detection |
Course Practice Problem |
Revise the progress till now from
https://github.com/mrdbourke/zero-to-mastery-ml
Then progress to Stanford CS231 Link
- Image Classification using CNN & Keras
- Weather Forecast & Prediction by Machine Learning
- DeepMoji
- FaceSwap
- Social Distancing with AI
- Keras Deep Speech
- Linear Algebra
- What is Machine Learning?
- Supervised v Unsupervised v Reinforcement Learning
- Getting started with Jupyter Notebook
- Linear Regression
- Non-Linear Regression
- Multiple Variable Linear Regression
- Classification
- Logistic Regression
- Regularisation
- K Nearest Neighbours
- Classification Metrics
- Support Vector Machine
- Naive Bayes
- Support Vector Machine
- Decision Tree
- Random Forest
- Stochastic Gradient Descent
- Neural Network
- Clustering
- K Means Clustering
- Hierarchical Clustering
- Mean Shift Clustering
- DBSCAN
- EM Algorithm
Helpful Books for Reference
- Deep learning book
- Dive into Deep Learning
- Mathematics for Machine Learning
- Probabilistic Machine Learning: An Introduction
Once you have made till here, you can jump on to solving Kaggle and taking up a bundle of Data Science projects!