forked from mbirgen/MathofDemocracy
-
Notifications
You must be signed in to change notification settings - Fork 0
/
section-ConfidenceInterval.tex
222 lines (179 loc) · 14.2 KB
/
section-ConfidenceInterval.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
\subsection{Confidence Intervals}
We use statistics calculated from samples to estimate parameters which describe something that is true for a whole population. In this section, we will differentiate between the two ideas by putting a \textbf{hat} on top of the symbols that represent statistics. In this section, we will see how to estimate the parameter from the statistic. It's always true that 95\% of all samples give a sample proportion ($\hat p$, a statistic) within two standard deviations of the population proportion ($p$, a parameter). That is, 95\% of all samples catch the real value of $p$ in the interval extending two standard deviations on either side of the estimated $\hat p$.
We capture the idea of this in a concept known as confidence intervals. One thing to keep in mind here is that we are working with percentages, so we need to work with the percentage margin of error which is our original margin of error divided by $n$.
\begin{enumerate}
\item \boxedblank{\textbf{Confidence Interval:}\ifsolns \par The \textbf{95\% confidence interval} is an interval obtained from the sample data by a method in which 95\% of all samples were produced an interval containing the true population parameter. A 95\% confidence interval for $p$ is approximately
$\displaystyle \hat{p}\pm 2\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$.
This formula is only approximately correct but is quite accurate when the sample size is bigger than 30.\fi \vfill
} \index{confidence interval}
\item \boxedblank{\textbf{Margin of Error:}\ifsolns \par
The margin of error is equal to half the width of a confidence interval. For 95\% confidence interval, it equals about two standard deviations of the sampling distribution of the estimated parameter. If you conducted a very large number of polls, about 95\% of the time difference between a poll's particular result and the true value of the population parameter would be within the margin of error.
\fi} \index{margin of error}
\item Suppose we want to estimate the proportion of adults who find shopping frustrating. If we take a simple random sample of size 2500, the sample proportion, $\hat{p}$, is 60\%. The graph below is going to represent the true values of the population parameter. Place the sample proportion at the center and calculate the standard deviations starting with the standard deviation of an unfair coin. Enter the values for the mean plus or minus two standard deviations in the appropriate places on the graph.
\pgfmathdeclarefunction{gauss}{3}{%
\pgfmathparse{3/(#3*sqrt(2*pi))*exp(-((#1-#2)^2)/(2*#3^2))}%
}
\begin{tikzpicture}
\begin{axis}[
no markers,
domain=0:8,
samples=100,
ymin=0,
axis lines*=left,
xlabel=$x$,
every axis y label/.style={at=(current axis.above origin),anchor=south},
every axis x label/.style={at=(current axis.right of origin),anchor=west},
height=5cm,
width=12cm,
xtick=\empty,
ytick=\empty,
enlargelimits=false,
clip=false,
axis on top,
grid = major,
hide y axis
]
\addplot [very thick,cyan!50!black] {gauss(x,4,1)};
\pgfmathsetmacro\valueA{gauss(1,4,1)}
\pgfmathsetmacro\valueB{gauss(2,4,1)}
\pgfmathsetmacro\valueC{gauss(3,4,1)}
\pgfmathsetmacro\valueD{gauss(4,4,1)}
%\draw[fill=gray] plot[smooth, samples=500,domain=0:2] (\x,gauss(\x,4,1)) -- (2,0) -- cycle;
%\addplot [fill=gray,domain=0:2] {gauss(x,4,1)}\closedcycle ;
%\addplot [fill=gray,domain=6:8] {gauss(x,4,1)}\closedcycle ;
%\addplot [fill=gray, fill opacity=0.20,domain=2:3] {gauss(x,4,1)}\closedcycle ;
%\addplot [fill=gray, fill opacity=0.20,domain=5:6] {gauss(x,4,1)}\closedcycle ;
%\addplot [fill=gray, fill opacity=0.05,domain=3:5] {gauss(x,4,1)}\closedcycle ;
%\pgfmathsetmacro\valueE{gauss(5,4,1)}
%\draw [gray] (axis cs:1,0) -- (axis cs:1,\valueA);
%(axis cs:5,0) -- (axis cs:5,\valueA);
\draw [gray] (axis cs:2,0) -- (axis cs:2,\valueB);
%(axis cs:4,0) -- (axis cs:4,\valueB);
%\draw [gray] (axis cs:3,0) -- (axis cs:3,\valueC);
%(axis cs:5,0) -- (axis cs:5,\valueC);
\draw [gray] (axis cs:4,0) -- (axis cs:4,\valueD);
%(axis cs:5,0) -- (axis cs:5,\valueD);
%\draw [gray] (axis cs:5,0) -- (axis cs:5,\valueC);
\draw [gray] (axis cs:6,0) -- (axis cs:6,\valueB);
%\draw [gray] (axis cs:7,0) -- (axis cs:7,\valueA);
%\draw [yshift=1.3cm, latex-latex](axis cs:3, 0) -- node [fill=white] {\small$68$\% of data} (axis cs:5, 0);
\draw [yshift=.7cm, latex-latex](axis cs:2, 0) -- node [fill=white] {\parbox[c][4em][c]{3.5 cm}{\small$95$\% of all samples give a result within this interval}} (axis cs:6, 0);
%\draw [yshift=0.2cm, latex-latex](axis cs:1, 0) -- node [fill=white] {\small$99.5$\% of data} (axis cs:7, 0);
\ifsolns
\node[below] at (axis cs:2, 0) {.58};
\node[below] at (axis cs:6, 0) {.62};
\fi
%\node[above, left] at (axis cs:2, \valueB) {$0.58$};
%\node[above, right] at (axis cs:5, \valueB) {$0.62};
\end{axis}
\end{tikzpicture}
\clearpage
\item At a college in Singapore, students were randomly selected and asked to complete a web-based survey about sexual behavior. Of the 534 student who did, 24\% reported having had sexual intercourse in the past six months.
\begin{enumerate}
\item What are the mean and standard deviation of the proportion $\hat{p}$ of the sample had sexual intercourse in the past six months?
\ifsolns
\[ \hat{p}=24\%,\qquad \sigma = 0.0185\]
\fi
\vfill
\item In what interval values to the proportions from 95\% of all samples fall? \ifsolns
\[ 20.3\% \text{ to } 27.7\%\]
\fi
\vfill
\item In what interval values to the proportions from 99.7\% of all samples fall? \ifsolns
\[ 18.5\% \text{ to } 29.5\%\]
\fi
\vfill
\end{enumerate}
\item Supposed the college had a selected 400 students. What interval would cover the middle 95\% of values then? What if the college had selected 1600 students? What if the college had selected 6400 students? \ifsolns
\par
\begin{tabular}{ccc}
$n$ & $\sigma$ Range\\
400 & 0.02 & 0.197 to 0.283\\
1600 & 0.01 & 0.22 to 0.26\\
6400 & 0.005 & 0.23 to 0.25\\
\end{tabular}
\fi
\vfill
\clearpage
\item In a random sample of students who took the SAT Reasoning college entrance exam twice, it was found that 427 of the respondents had paid for coaching courses and that the remaining 2733 had not. Give a 95\% confidence interval for the proportion/percentage of coaching among students to retake the SAT.
\ifsolns
\[ \hat{p}=13.5\%,\qquad \sigma = 0.006\]
\fi
\vfill
\item A Gallup poll asked 1785 randomly selected adults whether he or she happened to attend a house of worship in previous seven days. Of the respondents, 750 said ``yes.'' Give a 95\% confidence interval for the proportion of all adults who claim that they attended a house of worship during the week preceding the poll. \ifsolns
\[ \hat{p}=42.0168\%,\qquad \sigma = 0.01168\]
\fi
\vfill
\clearpage
\item In an automated telephone poll by Gravis Marketing/One America News of 1051 registered voters on November 23, 2015, 523 likely Republican voters were contacted. Of those people, 37\% said they would vote for Donald Trump in the Republican primary.
\begin{enumerate}
\item Give a 95\% confidence interval for the percentage of registered Republicans who would vote for Donald Trump. \ifsolns
\[ \hat{p}=37\%,\qquad \sigma = 0.021\]
\fi
\vfill
\item In the same poll, Ben Carson received 15\%, Marco Rubio received 14\%, and Ted Cruz received 12\%. Is there any evidence for difference in support among these three lower ranking candidates?
\ifsolns
\[ \ \sigma \simeq 0.015\]
\fi
\vfill
\end{enumerate}
\item In an automated telephone poll by Gravis Marketing/One America News of 1051 registered voters on November 23, 2015, 528 likely Democratic voters were contacted. Of those people, 59\% said they would vote for Hillary Clinton in the Democratic primary.
\begin{enumerate}
\item Give a 95\% confidence interval for the percentage of registered Democrats who would vote for Hillary Clinton. \vfill
\item In the same poll, Bernie Sanders received 32\% and Martin O'Malley received 8\%. Is there any evidence for difference in support among these three candidates? \vfill
\end{enumerate}
\clearpage
%\item Gallup's November update of Americans' 2015 holiday spending intentions finds U.S. adults planning to spend \$830 on Christmas gifts this year, on average. That is up sharply from the \$720 recorded a year ago, and is significantly higher than what consumers have indicated in any November since 2007. The poll was based on telephone interviews conducted Nov. 4-8, 2015, with a random sample of 1,021 adults, aged 18 and older, living in all 50 U.S. states and the District of Columbia. The margin of sampling error for the average Christmas spending estimate is $\pm\$63$ at the 95\% confidence level.
%\begin{enumerate}
%\item Give the 95\% confidence interval for the average amount of money Americans are going to spend on Christmas gifts this year. \vfill
%\item Is there statistical evidence for the statement that Americans will spend significantly more than a year ago? Explain. \vfill
%\end{enumerate}
\item A telephone survey of 880 randomly selected drivers asked, ``recalling the last 10 traffic lights you drove through, how many of them were red when you entered the intersection?'' Of the 880 respondents, 171 admitted that at least one light had been red.
\begin{enumerate}
\item Give the 95\% confidence interval for the proportion of all drivers who ran one or more of the last 10 red lights they met. \vfill
\item A practical problem with this survey is that people may not give truthful answers. What is the likely direction of the bias: do you think more or fewer than 171 of the 880 respondents really ran a red light? Why? \vfill
\end{enumerate}
\item Consider the margin of error formula $\displaystyle 2 \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$.
\begin{enumerate}
\item For fixed value of $n$, what value of $\hat{p}$ between zero and one causes this formula to attain its largest possible value? \vfill
\item Using the answer above, what would be a simplified (and slightly more conservative) formula for calculating the margin of error? \vfill
\end{enumerate}
\end{enumerate}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\HOMEWORK
\begin{Senumerate}
\item Gallup poll conducted February 1-3, 2010, by telephone interviews with 1025 randomly selected American talks found that 646 Americans say that their sympathies in the Middle East situation lie more with the Israelis than with the Palestinians.
\begin{enumerate}
\item Give a 95\% confidence interval for the proportion of all American adults whose sympathies in the Middle East situation like more with the Israelis and with the Palestinians. \vfill
\item In theory, in 19 out of 20 cases, the survey results will differ by no more than 20 percentage points in either direction from what went have been obtained by seeking out all American adults. Explain how your results agree with this statement.\vfill
\end{enumerate}
\item A Gallup poll conducted May 3-6, 2010, by telephone interviews of 1029 American adults found that 52\% of Americans called gay and lesbian relations morally acceptable. This was the first year that the statistic crossed the symbolic 50\% threshold and is largely due to a change in views among younger men.
\begin{enumerate}
\item How many of the 1029 people interviewed said gay and lesbian relations were morally acceptable? \vfill
\item Gallup says that the margin of error for the poll is plus or minus 4 percentage points. Explain to someone who knows nothing about statistics what ``margin of error plus or minus 4 percentage points'' means. \vfill
\item Give in 95\% confidence interval for this survey. Does your margin of error agree with the four percentage points announced by Gallup? \vfill
\end{enumerate}
\hwnewpage
\item Americans, by 60\% to 37\%, oppose plans for the U.S. to take in at least 10,000 Syrian refugees who are trying to escape the civil war in their country. This is in keeping with Americans' historical tendency to oppose taking in large numbers of refugees, something that has been evident in similar situations as far back as the 1930s.
Results for this Gallup poll are based on telephone interviews conducted Nov. 20-21, 2015, on the Gallup U.S. Daily survey, with a random sample of 1,013 adults, aged 18 and older, living in all 50 U.S. states and the District of Columbia. For results based on the total sample of national adults, the margin of sampling error is $\pm 4$ percentage points at the 95\% confidence level. All reported margins of sampling error include computed design effects for weighting.
Each sample of national adults includes a minimum quota of 60\% cellphone respondents and 40\% landline respondents, with additional minimum quotas by time zone within region. Landline and cellular telephone numbers are selected using random-digit-dial methods.
\begin{enumerate}
\item Give the 95\% confidence interval for the percentage of Americans who oppose plans for the U.S. to take in Syrian refugees. \vfill
\item Give a 99.7\% confidence interval for the percentage of Americans who oppose plans for the U.S. to take in Syrian refugees. \vfill
\item Does the sampling technique give you assurance that the sample is representative of American opinion? Why or why not? \vfill
\end{enumerate}
\hwnewpage
\item The proportion of one's body as fat is a key indicator of fitness. The many ways to estimate this have different margins of error (given in percentage points):
\begin{center}
\newcolumntype{C}[1]{>{\centering}m{#1}}
\begin{tabular}{| C{2cm}|C{2cm}|C{2cm}|C{2cm}|C{2cm}|C{2cm}|}
Method & Calipers pinch & bioelectrical impedance & body mass index calculator & hydrostatic weighing (dunk test) \tabularnewline \hline
Margin of error & $\pm 3$ & $\pm 4$& $\pm 10$ & $\pm 1$
\end{tabular}
\end{center}
\begin{enumerate}
\item Which of these tests is the least accurate? \vfill
\item If the pinch test says that you have 21\% body fat, what is the 95\% confidence interval for this estimate? \vfill
\end{enumerate}
\end{Senumerate} \ENDHOMEWORK
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%