-
Notifications
You must be signed in to change notification settings - Fork 1
/
Midterm.Rmd
309 lines (228 loc) · 9.59 KB
/
Midterm.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
---
title: "Midterm Exam"
author: "Monikrishna Roy"
date: "`r Sys.Date()`"
output:
html_document:
keep_md: yes
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(
fig.path = "README_figs/README-mid-"
)
library(bootstrap)
library(ggplot2)
library(gridExtra)
set.seed(02262021)
# 3.2, 3.4, 3.9, 3.10 (bonus), 3.11, 3.14, 3.16, 4.2
# On Problem 3.4, please use either the Inverse Transform Method, Acceptance-Rejection method, or a transformation method.
```
## Ex 3.2
### Problem
The standard Laplace distribution has density $f (x) = \frac{1}{2}e^{−|x|} , x ∈ R$. Use the inverse transform method to generate a random sample of size $1000$ from this distribution. Use one of the methods shown in this chapter to compare the generated sample to the target distribution.
### Solution
A random sample of size 1000 is generated from the inverse transform method. Here, the histogram ans laplace density function curve are shown to copare the result.
```{r}
# inverse transform to generate random sample
laplace_quantile <- function (size) {
u <- runif(size)
sapply(u, function (v) if (v > 0.5) { -log(-2*v + 2) } else { log(2*v) })
}
# laplace density function
laplace_density <- function (v) {
1/2*exp(-abs(v))
}
# variable initialize
size <- 1000
range <- c(-10,10)
x_seq <- seq(range[1], range[2], length.out = size)
# generating output and density
x_density <- laplace_density(x_seq)
y_random <- laplace_quantile(size)
# histogram ans density graph to compare the result
hist(y_random, probability = TRUE, breaks = 100, xlim = range)
lines(x_seq, y = x_density)
```
## Ex 3.4
### Problem
The Rayleigh density $[162, Ch. 18]$ is
$$
f (x) = \frac{x}{σ^2}e^{−x^2/(2σ^2)} , x ≥ 0, σ > 0.
$$
Develop an algorithm to generate random samples from a $Rayleigh(σ)$ distribution. Generate $Rayleigh(σ)$ samples for several choices of $σ > 0$ and check that the mode of the generated samples is close to the theoretical mode $σ$ (check the histogram).
### Solution
Generate random samples from Rayleigh distribution. The quanltie function for Rayleigh is, $F(x, σ) = σ \sqrt{-2*ln(1-x)}$.
Used $ggplot2$ and $gridExtra$ to draw histogram for several choices of $σ > 0$. Estimate the mode for each and they look good with comparison to histogram.
```{r}
# Inverse transform method to generate random sample
rayleigh_random <- function (size, sigma) {
u <- runif(size)
sqrt(-2 * sigma ^ 2 * log(1 - u))
}
size <- 1000
# generate a collection of Sigma
sigmas <- 1:5
# generating the sample
xs <- sapply(sigmas, function (v) rayleigh_random(size, v))
# convert to matrix
mat <- matrix(xs, ncol = length(sigmas), dimnames = list(NULL, sigmas))
# convert to data frame
df <- data.frame(mat)
# diagram the histogram
p1 = qplot(df$X1, main = paste0("Sigma = ", sigmas[1]), binwidth=0.1, geom="histogram")
p2 = qplot(df$X2, main = paste0("Sigma = ", sigmas[2]), binwidth=0.1, geom="histogram")
p3 = qplot(df$X3, main = paste0("Sigma = ", sigmas[3]), binwidth=0.1, geom="histogram")
p4 = qplot(df$X4, main = paste0("Sigma = ", sigmas[4]), binwidth=0.1, geom="histogram")
p5 = qplot(df$X5, main = paste0("Sigma = ", sigmas[5]), binwidth=0.1, geom="histogram")
grid.arrange(p1, p2, p3, p4, p5)
# Estimate the mode
mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
sapply(df, mode)
```
## Ex 3.9
### Problem
The rescaled Epanechnikov kernel [92] is a symmetric density function
$$f_e (x) = \frac{3}{4}(1 − x 2 ), |x| ≤ 1. $$
Devroye and Györfi [74, p. 236] give the following algorithm for simulation from this distribution. Generate iid $U_1 , U_2 , U_3 ∼ Uniform(−1, 1)$. If $|U_3| ≥ |U_2|$ and $|U_3| ≥ |U_1|$, deliver $U_2$ ; otherwise deliver $U_3$ . Write a function to generate random variates from $f_e$ , and construct the histogram density estimate of a large simulated random sample.
### Solution
Generated sample and density from Epanechnikov. Constructed histogram for the simulated sample.
```{r}
# generating iid
iid_gen <- function (size) {
u1 <- runif(size, min = -1, max = 1)
u2 <- runif(size, min = -1, max = 1)
u3 <- runif(size, min = -1, max = 1)
sapply(1:size, function(v)
if (abs(u3[v]) >= abs(u2[v]) && abs(u3[v]) >= abs(u1[v])) u2[v] else u3[v]
)
}
# generate the sample
size <- 1000
random_sample <- iid_gen(size = size)
# histogram of the random sample
hist(random_sample, probability = TRUE, breaks=20)
```
## Ex 3.10
### Problem
Prove that the algorithm given in Exercise 3.9 generates variates from the density $f_e$ (3.9).
### Solution
Drawing the density curve using the given density function. The density curve over the histogram proves that the generated vairates consistent with density.
```{r}
# density function of Ex 3.9
epanechnikov_density <- function (x) {
3/4 * (1 - x^2)
}
xlim <- range(random_sample)
x_seq <- seq(from = xlim[1], to = xlim[2], by = 0.01)
# histogram of Ex 3.9
hist(random_sample, probability = TRUE, breaks=20)
# Density from the funciton fe
lines(x = x_seq, y = epanechnikov_density(x_seq))
```
## Ex 3.11
### Problem
Generate a random sample of size 1000 from a normal location mixture. The components of the mixture have $N (0, 1)$ and $N (3, 1)$ distributions with mixing probabilities $p_1$ and $p_2 = 1 − p_1$ . Graph the histogram of the sample with density superimposed, for $p_1 = 0.75$. Repeat with different values for $p_1$ and observe whether the empirical distribution of the mixture appears to be bimodal. Make a conjecture about the values of $p_1$ that produce bimodal mixtures.
### Solution
After constructing histogram for different values of $P_1$, it seems that bimodality appears in range $0.4 < p_1 < 0.6$
```{r}
# generate the normal location mixture
mixture <- function(n, p) {
x1 <- rnorm(n, 0, 1)
x2 <- rnorm(n, 3, 1)
r <- sample(c(0, 1), n, replace = TRUE, prob = c(1 - p, p))
r * x1 + (1 - r) * x2
}
# set the size
size <- 1000
# for p1 = 0.75
m_1 <- mixture(size, 0.75)
hist(m_1, main = expression(p == 0.75), xlab='', ylab='')
par(mfrow=c(2,2)) #creates 2x2 'matrix' of plots
# construct histogram for different p1
m_2 <- mixture(size, 0.25)
hist(m_2, main = expression(p == 0.25), xlab='', ylab='')
m_2 <- mixture(size, 0.4)
hist(m_2, main = expression(p == 0.4), xlab='', ylab='')
m_3 <- mixture(size, 0.5)
hist(m_3, main = expression(p == 0.5), xlab='', ylab='')
m_3 <- mixture(size, 0.6)
hist(m_3, main = expression(p == 0.6), xlab='', ylab='')
```
## Ex 3.14
### Problem
Generate 200 random observations from the 3-dimensional multivariate normal distribution having mean vector $μ = (0, 1, 2)$ and covariance matrix
$$Σ = \begin{bmatrix}
1.0 & - 0.5 & 0.5\\
-0.5 & 1.0 & 0.5\\
0.5 & - 0.5 & 1.0\\
\end{bmatrix}$$
using the Choleski factorization method. Use the R pairs plot to graph an array of scatter plots for each pair of variables. For each pair of variables, (visually) check that the location and correlation approximately agree with the theoretical parameters of the corresponding bivariate normal distribution.
### Solution
A program to generate multivariate normal via Choleski factorization. Plotting the scatter plots and noticed that the sample patterns agree with correlations.
```{r}
runCholeski <- function(size, mus, sigma) {
d <- length(mus)
q <- t(chol(sigma))
z <- matrix(rnorm(n = d * size), ncol = size, nrow = d)
t(q %*% z + mus)
}
mus <- 0:2
size <- 200
sigma <- matrix(c(1, -0.5, 0.5, -0.5, 2, -0.5, 0.5, -0.5, 3), ncol = 3)
ys <- runCholeski(size = size, mus = mus, sigma = sigma)
pairs(ys)
```
## Ex 3.16
### Problem
Efron and Tibshirani discuss the $scor (bootstrap)$ test score data on
88 students who took examinations in five subjects [91, Table 7.1], [194, Table 1.2.1]. Each row of the data frame is a set of scores $(x_i1 , . . . , x_i5 )$ for the $i^{th}$ student. Standardize the scores by type of exam. That is, standardize the bivariate samples $(X_1 , X_2 )$ (closed book) and the trivariate samples $(X_3 , X_4 , X_5 )$ (open book). Compute the covariance matrix of the transformed sample of test scores.
### Solution
Computed the covariance matrix of the transformed sample of test scores.
```{r}
# normalization
normalize <- function (data) {
Sigma <- cov(data)
mus <- sapply(1:ncol(data), function (v) mean(data[,v]))
A <- chol(solve(Sigma))
t(A %*% (t(data)-mus))
}
# bootstrap scor
norm.scor <- scor
m1 <- normalize(data.matrix(norm.scor[,1:2]))
m2 <- normalize(data.matrix(norm.scor[,3:5]))
norm.scor[,1:2] = m1
norm.scor[,3:5] = m2
# covariance matrix
round(cov(norm.scor), digits=4)
```
## Ex 4.2
### Problem
A compound Poisson process is a stochastic process ${X(t), t ≥ 0}$ that can be represented as the random sum $X(t) = \sum_{i=1}^{N(t)}{Y_i}, t ≥ 0,$ where ${N (t), t ≥ 0}$ is a Poisson process and $Y_1 , Y_2 , . . .$ are iid and independent of ${N (t), t ≥ 0}$. Write a program to simulate a compound Poisson(λ)–Gamma process (Y has a Gamma distribution). Estimate the mean and the variance of $X(10)$ for several choices of the parameters and compare with the theoretical values. Hint: Show that $E[X(t)] = λtE[Y_1 ]$ and $Var(X(t)) = λtE[Y_1^2 ]$.
### Soultion
The estimated mean and variance are almost similar with theoretical values.
```{r}
lambda = 2
shape = 5
scale = 2
size = 10000
t = 10
# generate N(t) which follow the poisson process.
ns = rpois(size, t * lambda)
# generate X(t) as in the problem description.
xs = sapply(ns, function (n) {
ys = c(rgamma(n = n, shape = shape, scale = scale))
sum(ys[1:n])
})
# Mean from function
(mean.s = mean(xs))
# mean theoretical
(mean.t = lambda * t * shape * scale)
# Variance
(var.s = var(xs))
# From theoretical
(var.t = lambda * t * (shape * scale)^2)
```