# Importing the necessary packages
import numpy as np                                  # "Scientific computing"
import scipy.stats as stats                         # Statistical tests

import pandas as pd                                 # Data Frame
from pandas.api.types import CategoricalDtype

import random
import math

import matplotlib.pyplot as plt                     # Basic visualisation
from statsmodels.graphics.mosaicplot import mosaic  # Mosaic diagram
import seaborn as sns                                # Advanced data visualisation
from sklearn.linear_model import LinearRegression
import altair as alt                                # Alternative visualisation system

Student $t$-distribution in Python
Import scipy.stats
For a $t$-distribution with df degrees of freedom: (df = degrees of freedom)

Function	Purpose
stats.t.pdf(x, df=d)	Probability density for $x$
stats.t.cdf(x, df=d)	Left-tail probability 𝑃(𝑋 < x)
stats.t.sf(x, df=d)	Right-tail probability 𝑃(𝑋 > x)
stats.t.isf(1-p, df=d)	p% of observations are expected to be lower than this value

Normal distribution in Python Python functions

Import scipy.stats
For a normal distribution with mean m and standard deviation s:

Function	Purpose
stats.norm.pdf(x, loc=m, scale=s)	Probability density at $x$
stats.norm.cdf(x, loc=m, scale=s)	Left-tail probability 𝑃(𝑋 < x)
stats.norm.sf(x, loc=m, scale=s)	Right-tail probability 𝑃(𝑋 > x)
stats.norm.isf(1-p, loc=m, scale=s)	p% of observations are expected to be lower than result

graph LR
A[Data Characteristics] -- Sample Size < 30 --> B[t-test]
A -- Sample Size >= 30 --> C[z-test]
A -- Sample Size Unknown --> C
B -- Population Distribution Unknown --> C
B -- Population Distribution Known and Normally Distributed --> C
C -- Variances Equal and Known --> D[z-test]
C -- Variances Unequal or Unknown --> B

Loading

Requirements z-test:

Random sample
Sample groot genoeg (n >= 30)
- als normaal verdeeld is is sample size niet relevant
normaal verdeeld
populatie standaard deviatie is gekend

indien 1 van deze niet voldaan is gebruik je de t-test en deze normaal verdeeld is

H4 -> 2 kwalitatieve variabelen

H0 -> er is geen verband tussen de 2 variabelen H1 -> er is een verband tussen de 2 variabelen

De Chi-kwadraattoets wordt gebruikt om associaties tussen categorische variabelen te beoordelen

Cramér's V meet de sterkte van deze associatie

en de goodness-of-fit test controleert of de waargenomen frequenties overeenkomen met de verwachte theoretische verdeling.

H5 -> 1 kalitatieve variabele en 1 kwantitatieve variabelen

Use the t-test for independence when comparing the means of two independent groups or conditions.

2 onafhankelijke groepen
vergelijken van het gemiddelde van 2 groepen (niet perse even groot)
gemiddelde van 2 verschillende groepen
Groep met placebo en groep met medicijn

Use the paired t-test when analyzing paired or matched observations to compare means within the same group under different conditions or time points.

2 afhankelijke groepen
zelfde test subjecten
Voorbeeld zelfde auto met verschillende soorten benzine

Use Cohen's d as a measure of effect size to interpret the practical significance of the observed difference and compare effect sizes across different studies or conditions.

Effectgrote -> hoe groot is het verschil tussen de 2 groepen

H6 -> 2 kwantitatieve variabelen

dependend variable -> y independend variable -> x

Use regression analysis when you want to understand the relationship between a dependent variable and independent variables, and predict the value of the dependent variable.

Use the correlation coefficient when you want to measure the strength and direction of the linear relationship between two variables.

r -> 0 -> geen correlatie -> alle punten liggen verspreid
r -> 1 -> positieve correlatie -> alle punten liggen op 1 lijn stijgend
r -> -1 -> negatieve correlatie -> alle punten liggen op 1 lijn dalend

Use the coefficient of determination (R-squared) to assess the model fit, compare models, and interpret the proportion of variance explained by the independent variables.

r² -> 0 -> zwakke correlatie
r² -> 1 -> sterke correlatie

H7 -> time series

moving averages
- simple moving average
- weighted moving average
  - exponential moving average
exponential smoothing
- single exponential smoothing -> exponential smoothing
  - geen trend of seasonality
- double exponential smoothing -> Holt's method
  - trend
- triple exponential smoothing -> Holt-Winters method
  - trend en seasonality

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cheatsheet.md

Cheatsheet.md

H1 - Samples

H2 - Analyse van 1 variabele

H3

H4 -> 2 kwalitatieve variabelen

H5 -> 1 kalitatieve variabele en 1 kwantitatieve variabelen

H6 -> 2 kwantitatieve variabelen

H7 -> time series

Files

Cheatsheet.md

Latest commit

History

Cheatsheet.md

File metadata and controls

H1 - Samples

H2 - Analyse van 1 variabele

H3

H4 -> 2 kwalitatieve variabelen

H5 -> 1 kalitatieve variabele en 1 kwantitatieve variabelen

H6 -> 2 kwantitatieve variabelen

H7 -> time series