UCLA Extension Introduction to Data Science (COM SCI X450.1) class materials. Use this repo to access supplemental learning resources, eBooks, and other handouts. Additional content may be added throughout the course. Past students are always welcome!
- Becoming a machine learning company means investing in foundational technologies - Companies successfully adopt machine learning either by building on existing data products and services, or by modernizing existing models and algorithms
- Op-Ed: The real reason we’re afraid of robots - Cogent Op-Ed relating to the "Killer AI" meme
- UVA Data Points: a podcast exploring the world of data science - Cool podcast from School of Data Science, University of Virginia
- Data Scientist vs Machine Learning Engineer - A great distinction!
- Data Is An Art, Not Just A Science—And Storytelling Is The Key - Data storytelling is key to successful data science
- What Is Causal Inference? An Introduction for Data Scientists - A great intro to this growing area
- My Journey into AI Webinar - Panel discussion hosted by DeepLearning.AI
- R vs Python! Which one should you choose for data science? - A balanced comparison of the two most popular data science languages
- Data Con LA Youtube channel - Video presentations from the Data Con LA annual conference
- Data Scientist Resume: Template, Examples and Complete Guide - What a successful data science resume looks like
- insideBIGDATA "Ask a Data Scientist" Series - My popular educational series sponsored by Intel
- All my opendatascience.com articles - Many article keeping pace with the field of data science
- Google Dataset Search - NEW! Resource for data scientists
- The Importance of SQL in Practicing Data Science - Reinforcing my advice in class!
- What is Data Science 'Impostor Syndrome'? - Avoid the fear of what you don't know
- Becoming a Data Scientist - Important pointers by head of Kaggle Learn, Dan Becker, Ph.D.
- Industrial Research in Applied Statistics- AMS - Nice article about being a data scientist.
- 6 Reasons Why Data Science Projects Fail - A report from down in the trenches.
- The Difference Between Data Scientists and Data Engineers - A guide to becoming a unicorn.
- forester: an R package for automated building of tree-based models -
- A UseR’s Introduction to Machine Learning in AWS - Nice Youtube presentation for doing ML on AWS with R
- The Hitchhiker’s Guide to Responsible Machine Learning - Educational comic book
- Feature Engineering and Selection: A Practical Approach for Predictive Models - Nice learning resource by Max Kuhn and Kjell Johnson
- 2020 Outlook on AutoML Updates & Latest Recent Advances - Latest authoritative list of AutoML tools and frameworks
- Data Science Meetup (Feb. 26, 2020) Gradient Boosting Machines (GBM): From Zero to Hero - Slides from a great Meetup
- Data Science Meetup (Feb. 26, 2020) Gradient Boosting Machines (GBM): From Zero to Hero - GitHub repos with R and Python code
- 10 Tips for Choosing the Optimal Number of Clusters - Great article that drills down into unsupervised machine learning clustering
- NGBoost: Natural Gradient Boosting for Probabilistic Prediction - HOT new machine learning algorithm using boosting
- Preventing undesirable behavior of intelligent machines - Cool research paper addressing the debate over machine learning bias
- VIDEO presentation from LA West R Meetup group - Better Than Deep Learning Gradient Boosting Machine 2019
- SLIDES from LA West R Meetup group - Better Than Deep Learning Gradient Boosting Machine 2019
- Linear Regression with Healthcare Data for Beginners in R - Nice starter exercise for newbie data scientists
- Book Review: Deep Learning Revolution - Nice deep learning book for a general audience.
- Evaluate your R model with MLmetrics - Using R’s MLmetrics to evaluate machine learning models. MLmetrics provides several functions to calculate common metrics for ML models, including AUC, precision, recall, accuracy, etc.
- Assessment Metrics for Clustering Algorithms - Metrics for clustering and unsupervised machine learning
- A Brief History of AI with Deep Learning - A complete overview of the history of AI
- R advantages over python - R has many advantages over python that should be taken into consideration when choosing which language to use for Data Science
- Time Series Analysis in R: How to Read and Understand Time Series Data
- Tidyverse vs. Base-R: How To Choose The Best Framework For You - The base R vs. tidyverse debate rages on!
- Static and Dynamic Web Scraping with R - Over view of web scraping with R
- Nice self-contained data project example - CO2 Emissions Comparing and Modeling for Global Warming
- Presentations from So Cal R User Group Meetup - Youtube channel for the So Cal R Users Group
- [Frustration: One Year With R](https://github.com/ReeceGoding/Frustration-One-YStatic and Dynamic Web Scraping with Rear-With-R) - One coders experience with R
- How to Create a Dataframe in R with 30 Code Examples - Useful data frame tutorial in R
- Getting started simulating data in R: some helpful functions and how to use them - Nice blog article on simulated data in R
- R for Data Science - Official Tidyverse book by Hadley Wickham
- TidyverseSkeptic - Tidyverse Skeptic essay by Norm Matloff
- useR! Conference - Premiere global R conference
- Book R code - R code for my book "Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R
- Python vs R in Data Science: Which one is better? - Balanced overview of the R and Python data science programming languages
- R vs Python: Different similarities and similar differences - Nice cross-comparison of the R and Python data science programming languages
- R Tutorials - Good R programming tutorials to read in parallel with weeks 1-4 of class
- Descriptive Statistics in R - Good resource for Week 7 of class on EDA
- Demystifying Regular Expressions in R - Intro to text analytics
- Text Data Analysis in R: Understanding grep, grepl, sub and gsub - Practical use of regular expressions
- Unearthing Golden Nuggets of Data: A RegEx Treasure Hunt in R - RegEx tutorial in R with stringr
- Vignette: data.table - Data.table is an extension of data.frame package in R
- How Tidyverse Guides R Programmers Through Data Science Workflows - A cohesive way of approaching data science projects
- Type conversion and you (or and R) - More examples on type conversion and coercion in R
- Essential list of useful R packages for data scientists - Great list of important R packages for data scientists
- R plot pch symbols : The different point shapes available in R - An examination of all the popular PCH argument values for data vizualizations
- R color names
- A Complete Tutorial on Time Series Modeling in R
- Web scraping 101 - A vignette for the rvest R package
- Intro to Python Programming - Kaggle course on very basics of Python language
- An Introduction to Statistical Learning: with Applications in R... with Python! - Solutions to ISL exercises in Python
- Python Programming Tutorials - Many tutorial resources for Python coding for data science and machine learning
- Talk Python - A podcast on Python and related technologies
- Theoretical Foundations of Data Science— Should I Care or Simply Focus on Hands-on Skills? - YES, you should care!
- Linear Algebra via MIT OpenCourseWare - Learn linear algebra from Gil Strang, the best of the best!
- Calculus — Multivariate Calculus And Machine Learning -- A Must Know Concept For Every Professional - Here is the bare minimum Calculus necessary for machine learning.
- All of Statistics: A Concise Course in Statistical Inference - A widely respected book on statistics with R code.
- The Book of R: A First Course in Programming and Statistics - As mentioned in class, a great way to learn statistics.
- Do my data follow a normal distribution? - A note on the most widely used distribution and how to test for normality in R.
- Fisher's exact test in R: independence test for a small sample - Focuses on the Fisher’s exact test. Independence tests are used to determine if there is a significant relationship between two categorical variables.
- Telling Stories with Data With Applications in R - Excellent data science learning resource with heavy lean towards use case examples.
- Introduction to Statistical Learning with R - Great book to use following this class.
- Introduction to Statistical Learning with Python - NEW! First printing Juyl 5, 2023.
- Elements of Statistical Learning - The "Machine Learning Bible."
- Mathematics for Machine Learning - Wonderful all-in-one text for getting up to speed with the mathematics required for machine learning.
- Linear Algebra Done Right - ML is all about Linear Algebra. Great learning resource.
- Bookdown - A collection of free eBooks on data science and R.
- Probabilistic Machine Learning: An Introduction - Machine learning built on a foundation of probability theory.
- Using NFL Analytics to Contextualize Player Performance & Predict Playoff Probability - Work done by student Jesse Lubell (Summer 2021)
- Analysis of Reptile & Amphibian Observations in Los Angeles County - Work done by student Timothy Stegman (Fall 2020)
- Analysis of Match Statistics and Team Performances in the Premier League From Season 2015/16 to Season 2019/20 - Work done by student Tara Nguyen(Fall 2020)
- What Makes Us Happy - Using Kaggle "Young People Survey" data set - Work done by student Alexander Fichtl, taking course from Germany (Spring 2020)
- Data Analysis Evolution of Popular Music - Work done by student William Toth (Spring 2020)
- Airbnb Price Prediction for different areas in NYC - Work done by student Hashneet Kaur (Winter 2020)
- Predicting-Hotel-booking-demand-and-cancellation - Work done by student Elaine Kuang (Winter 2020)
- Analysis-of-Coronavirus-COVID-19-New-Confirmed-Cases - Work done by student Micky Lee (Winter 2020)
- Data Analysis for 2019 Indian General Election - Work done by student Junhui Yang (Winter 2020)
- FIVB Beach Volleyball Historic Top 8 Teams Analysis - Work done by student Tyler Widdison (Fall 2019)
- Data Analysis for PM2.5 in Beijing - Work done by student Xiaozhu Zhang (Spring 2019)
- Daniel D. Gutierrez - LinkedIn
This project is licensed under the MIT License - see the LICENSE.md file for details