Skip to content

Latest commit

 

History

History
85 lines (52 loc) · 8.36 KB

README.md

File metadata and controls

85 lines (52 loc) · 8.36 KB

ICS 434:

1_Introduction.ipynb

This notebook introduces the basics of data science, its origins, and its key components. Data science is an area that brings together methods from statistics and computer science to handle, analyze, and get insights from data. The notebook covers the essential parts of data science: collecting data, preparing and cleaning it, exploring it to spot trends, building models to predict future trends, and visualizing data in clear and informative ways.

2_Intro_to_pandas_Python_package.ipynb

This notebook provides an overview of packages and modules in Python. It explains how packages are structured directories containing Python modules, which are individual Python files, and details how they can be imported and used to organize and reuse code efficiently in Python programming.

3_intro_to_pandas.ipynb

This notebook introduces Pandas, the leading library for data wrangling. Specifically, the notebook introduces two pivotal data structures essential for data wrangling (Series and DataFrames), and provides an in-depth exploration of indexing techniques for efficient data handling.

4_exploratory_data_analysis.ipynb

This notebook provides a comprehensive introduction to exploratory data analysis using Pandas. We start by exploring general dataset attributes, such as the number of rows and columns, and understanding column data types. The notebook then delves into methods for invoking descriptive statistics operations, such as calculating the mean and median, and describes the concept of axis in Pandas operations. The notebook also describes how missing values are handled and provides insights into sorting data and concludes with practical examples of using basic Pandas plots for data visualization.

5_arithmetic_ops_and_data_alignment.ipynb

This notebook provides a thorough overview of vectorization in Pandas and demonstrates the efficiency of vectorized operations over traditional loops, the concept of broadcasting in array manipulation, and how to apply arithmetic and comparison operations effectively in Pandas. Additionally, the notebook covers data querying and subsetting, highlighting the ease and speed of handling large datasets with these techniques.

6_0_summary_statistics.ipynb

This notebook offers a concise overview of summary statistics, essential for data analysis. It covers key concepts like central tendency measures (mean, median, mode). The notebook also discusses measures of variability (range, variance, standard deviation) and quantiles (quartiles, percentiles) and highlights their role in describing data distribution.

12_intro_probability.ipynb

This Jupyter Notebook serves as an introduction to basic probability concepts and terminology. I also introduces a simulation technique to illustrate the the long-term frequency of events by exploring a simple problem.

13_probability_distributions_binomial.ipynb

This Jupyter Notebook introduces the binomial probability distribution, providing a comprehensive exploration through practical examples.

14_probability_distributions_gaussian.ipynb

This Jupyter Notebook introduces the Guassian probability distribution, providing a comprehensive exploration through practical examples.

15_kernel_density_estimation.ipynb

This Jupyter Notebook introduces kernel density estimation, starting with an overview of histograms, their limitations, and moves on to the concept and application of kernel density estimation as a more effective method for estimating the probability density function of a random variable.

16_KDE_bandwidth.ipynb

This Jupyter Notebook focuses on the estimation of bandwidth in kernel density estimation, detailing the methodologies and considerations involved in selecting an optimal bandwidth to accurately approximate the probability density function of a dataset.

17_probability_distributions_poisson.ipynb

This Jupyter Notebook introduces the Poisson probability distribution, providing a comprehensive exploration through practical examples.

18_param_estimation_bootstrap.ipynb

This Jupyter Notebook covers parameter estimation with a focus on Bootstrap Confidence Intervals, explaining the process and techniques for estimating confidence intervals using the bootstrap method.

19_param_esitmation_maximum_likelihood.ipynb

This Jupyter Notebook presents parameter estimation through maximum likelihood (ML). It privides a practical understanding of Likelihood, and delves into the concept and significance of Log Likelihood in optimizing parameter estimates.

9_group_by.ipynb

This Jupyter Notebook explores the groupby method, focusing on the split-apply-combine strategy for data aggregation, transformation, filtering, and thinning within groups. It offers a concise examination of how to efficiently manage and analyze grouped data in Python.

10_hierarchical_indexes.ipynb

This Jupyter Notebook introduces Hierarchical Indexing, expanding upon its mention in our groupby discussions. It details how to implement multiple indexes on rows and/or columns. The concept of levels within a MultiIndex object is also explored, providing a foundational understanding of structured data manipulation and analysis.

21_hypothesis_testing_normal.ipynb

This Jupyter Notebook introduces the concept of multiple testing using bootstrap methods. It guides you through building a background distribution via bootstrapping—sampling repeatedly with replacement—to estimate variability. We then compare actual data against this distribution to discern statistically significant results from those that could occur by chance.

22_hypothesis_testing_multi_categories.ipynb

This Jupyter Notebook explores the technique of comparing proportions using bootstrap methods. It demonstrates how to create a simulated distribution of sample proportions through repeated bootstrapping, then compares these proportions to actual data to determine if observed differences are statistically significant or likely due to random variation.

25_correlation.ipynb

This Jupyter Notebook introduces the concept of correlation analysis. It explains how to calculate and interpret correlation coefficients, helping you understand the strength and direction of relationships between two variables. It also explains how to interpret the R-square statistic, which is common through out machine learning models.

26_linear_regression.ipynb

This Jupyter Notebook introduces the basics of linear regression. It walks you through the steps of fitting a linear model to data, helping you understand how to predict one variable based on another. The notebook includes simple, practical examples

27_non_linear_regression.ipynb

This Jupyter Notebook delves into non-linear regression, tailored for beginners in data science. It explains how to model relationships between variables that don't follow a straight line, using more complex functions. The notebook provides examples to illustrate the fitting of non-linear models to data, helping you grasp the basics of this important statistical technique.

28_time_series_regression_based.ipynb

This Jupyter Notebook introduces the fundamentals of time series regression. It guides you through identifying trends (linear or non-linear) and seasonal patterns in time series data, and then models these characteristics to make forecasts. The notebook offers step-by-step examples to clearly demonstrate how to analyze and model time-related data effectively.

29_exponential_smoothing.ipynb

This Jupyter Notebook explores exponential smoothing techniques, including single, double, and triple smoothing methods. It teaches how to apply these techniques to forecast data, adjusting for level, trend, and seasonality. The notebook provides straightforward examples to help you understand and implement exponential smoothing.

30_clustering.ipynb

This Jupyter Notebook introduces clustering techniques, focusing on k-means and hierarchical clustering. It explains how to group data based on similarities using simple Euclidean or non-Euclidean distances. The notebook includes practical examples to demonstrate both methods and introduces the silhouette coefficient to evaluate the quality of the clustering.

31_mixture_models.ipynb

This Jupyter Notebook explores the use of mixture models for clustering. It focuses on implementing the Expectation-Maximization (EM) algorithm to classify data into two clusters and discusses methods for extending this approach to more than two clusters.