Skip to content

Commit

Permalink
updates
Browse files Browse the repository at this point in the history
  • Loading branch information
Ye committed Nov 29, 2024
1 parent d0a5173 commit aeb51a2
Show file tree
Hide file tree
Showing 4 changed files with 35 additions and 25 deletions.
40 changes: 20 additions & 20 deletions docs/labs/04.LogisticRegression.html
Original file line number Diff line number Diff line change
Expand Up @@ -198,15 +198,15 @@
<h2 id="toc-title">Table of contents</h2>

<ul>
<li><a href="#preparing-the-input-variables" id="toc-preparing-the-input-variables" class="nav-link active" data-scroll-target="#preparing-the-input-variables"><span class="header-section-number">4.1</span> Preparing the input variables</a>
<li><a href="#preparing-the-input-variables" id="toc-preparing-the-input-variables" class="nav-link active" data-scroll-target="#preparing-the-input-variables"><span class="header-section-number">4.1</span> Preparing the input variables</a></li>
<li><a href="#implementing-a-logistic-regression-model" id="toc-implementing-a-logistic-regression-model" class="nav-link" data-scroll-target="#implementing-a-logistic-regression-model"><span class="header-section-number">4.2</span> <strong>Implementing a logistic regression model</strong></a>
<ul class="collapse">
<li><a href="#implementing-a-logistic-regression-model" id="toc-implementing-a-logistic-regression-model" class="nav-link" data-scroll-target="#implementing-a-logistic-regression-model"><span class="header-section-number">4.1.1</span> <strong>Implementing a logistic regression model</strong></a></li>
<li><a href="#model-fit" id="toc-model-fit" class="nav-link" data-scroll-target="#model-fit"><span class="header-section-number">4.1.2</span> <strong>Model fit</strong></a></li>
<li><a href="#statistical-significance-of-regression-coefficients-or-covariate-effects" id="toc-statistical-significance-of-regression-coefficients-or-covariate-effects" class="nav-link" data-scroll-target="#statistical-significance-of-regression-coefficients-or-covariate-effects"><span class="header-section-number">4.1.3</span> <strong>Statistical significance of regression coefficients or covariate effects</strong></a></li>
<li><a href="#interpreting-estimated-regression-coefficients" id="toc-interpreting-estimated-regression-coefficients" class="nav-link" data-scroll-target="#interpreting-estimated-regression-coefficients"><span class="header-section-number">4.1.4</span> <strong>Interpreting estimated regression coefficients</strong></a></li>
<li><a href="#prediction-using-fitted-regression-model" id="toc-prediction-using-fitted-regression-model" class="nav-link" data-scroll-target="#prediction-using-fitted-regression-model"><span class="header-section-number">4.1.5</span> Prediction using fitted regression model</a></li>
<li><a href="#model-fit" id="toc-model-fit" class="nav-link" data-scroll-target="#model-fit"><span class="header-section-number">4.2.1</span> <strong>Model fit</strong></a></li>
<li><a href="#statistical-significance-of-regression-coefficients-or-covariate-effects" id="toc-statistical-significance-of-regression-coefficients-or-covariate-effects" class="nav-link" data-scroll-target="#statistical-significance-of-regression-coefficients-or-covariate-effects"><span class="header-section-number">4.2.2</span> <strong>Statistical significance of regression coefficients or covariate effects</strong></a></li>
<li><a href="#interpreting-estimated-regression-coefficients" id="toc-interpreting-estimated-regression-coefficients" class="nav-link" data-scroll-target="#interpreting-estimated-regression-coefficients"><span class="header-section-number">4.2.3</span> <strong>Interpreting estimated regression coefficients</strong></a></li>
<li><a href="#prediction-using-fitted-regression-model" id="toc-prediction-using-fitted-regression-model" class="nav-link" data-scroll-target="#prediction-using-fitted-regression-model"><span class="header-section-number">4.2.4</span> Prediction using fitted regression model</a></li>
</ul></li>
<li><a href="#extension-activities" id="toc-extension-activities" class="nav-link" data-scroll-target="#extension-activities"><span class="header-section-number">4.2</span> <strong>Extension activities</strong></a></li>
<li><a href="#extension-activities" id="toc-extension-activities" class="nav-link" data-scroll-target="#extension-activities"><span class="header-section-number">4.3</span> <strong>Extension activities</strong></a></li>
</ul>
<div class="toc-actions"><ul><li><a href="https://github.com/GDSL-UL/stats/edit/main/labs/04.LogisticRegression.qmd" class="toc-action"><i class="bi bi-github"></i>Edit this page</a></li></ul></div></nav>
</div>
Expand Down Expand Up @@ -556,8 +556,9 @@ <h2 data-number="4.1" class="anchored" data-anchor-id="preparing-the-input-varia
<div class="cell">
<div class="sourceCode cell-code" id="cb31"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1"><a href="#cb31-1" aria-hidden="true" tabindex="-1"></a>sar_df<span class="sc">$</span>nssec <span class="ot">&lt;-</span> <span class="fu">relevel</span>(<span class="fu">as.factor</span>(sar_df<span class="sc">$</span>nssec), <span class="at">ref =</span> <span class="st">"2"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</div>
<section id="implementing-a-logistic-regression-model" class="level3" data-number="4.1.1">
<h3 data-number="4.1.1" class="anchored" data-anchor-id="implementing-a-logistic-regression-model"><span class="header-section-number">4.1.1</span> <strong>Implementing a logistic regression model</strong></h3>
</section>
<section id="implementing-a-logistic-regression-model" class="level2" data-number="4.2">
<h2 data-number="4.2" class="anchored" data-anchor-id="implementing-a-logistic-regression-model"><span class="header-section-number">4.2</span> <strong>Implementing a logistic regression model</strong></h2>
<p>The binary dependent variable is long-distance commuting, variable name <code>New_work_distance</code>.</p>
<p>The independent variables are gender and socio-economic status.</p>
<p>For gender, we use male as the basline.</p>
Expand Down Expand Up @@ -646,9 +647,8 @@ <h3 data-number="4.1.1" class="anchored" data-anchor-id="implementing-a-logistic
<div class="sourceCode cell-code" id="cb41"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb41-1"><a href="#cb41-1" aria-hidden="true" tabindex="-1"></a>sar_df <span class="ot">&lt;-</span> sar_df <span class="sc">%&gt;%</span> <span class="fu">mutate</span>(<span class="at">New_nssec =</span> <span class="fu">if_else</span>(<span class="sc">!</span>nssec <span class="sc">%in%</span> <span class="fu">c</span>(<span class="dv">1</span>,<span class="dv">2</span>,<span class="dv">8</span>), <span class="st">"0"</span> ,nssec))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</div>
<p>Use “Other occupations” (code: 0) as the reference category by <code>relevel(as.factor())</code> and then create the regression model: <code>glm(New_work_distance~sex + New_nssec, data = sar_df, family= "binomial")</code>. Can you now run the model by yourself? Find the answer at the end of the practical.</p>
</section>
<section id="model-fit" class="level3" data-number="4.1.2">
<h3 data-number="4.1.2" class="anchored" data-anchor-id="model-fit"><span class="header-section-number">4.1.2</span> <strong>Model fit</strong></h3>
<section id="model-fit" class="level3" data-number="4.2.1">
<h3 data-number="4.2.1" class="anchored" data-anchor-id="model-fit"><span class="header-section-number">4.2.1</span> <strong>Model fit</strong></h3>
<p>We include the R library <code>pscl</code> for calculate the measures of fit.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb42"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb42-1"><a href="#cb42-1" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span>(<span class="sc">!</span><span class="fu">require</span>(<span class="st">"pscl"</span>))</span>
Expand Down Expand Up @@ -709,14 +709,14 @@ <h3 data-number="4.1.2" class="anchored" data-anchor-id="model-fit"><span class=
</ul>
<p>Different from the multiple linear regression, whose R-squared indicates % of the variance in the dependent variables that is explained by the independent variable. In logistic regression model, R-squared is not directly applicable. Instead, we use pseudo R-squared measures, such as McFadden’s pseudo R-squared, or Cox &amp; Snell pseudo R-squared to provide an indication of model fit. For the individual level dataset like SAR, value around 0.3 is considered good for well-fitting.</p>
</section>
<section id="statistical-significance-of-regression-coefficients-or-covariate-effects" class="level3" data-number="4.1.3">
<h3 data-number="4.1.3" class="anchored" data-anchor-id="statistical-significance-of-regression-coefficients-or-covariate-effects"><span class="header-section-number">4.1.3</span> <strong>Statistical significance of regression coefficients or covariate effects</strong></h3>
<section id="statistical-significance-of-regression-coefficients-or-covariate-effects" class="level3" data-number="4.2.2">
<h3 data-number="4.2.2" class="anchored" data-anchor-id="statistical-significance-of-regression-coefficients-or-covariate-effects"><span class="header-section-number">4.2.2</span> <strong>Statistical significance of regression coefficients or covariate effects</strong></h3>
<p>Similar to the statistical inference in a linear regression model context, p-values of regression coefficients are used to assess significances of coefficients; for instance, by comparing p-values to the conventional level of significance of 0.05:</p>
<p>·&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; If the p-value of a coefficient is smaller than 0.05, the coefficient is statistically significant. In this case, you can say that the relationship between an independent variable and the outcome variable is <em>statistically</em> significant.</p>
<p>·&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; If the p-value of a coefficient is larger than 0.05, the coefficient is statistically insignificant. In this case, you can say or conclude that there is no statistically significant association or relationship between an independent variable and the outcome variable.</p>
</section>
<section id="interpreting-estimated-regression-coefficients" class="level3" data-number="4.1.4">
<h3 data-number="4.1.4" class="anchored" data-anchor-id="interpreting-estimated-regression-coefficients"><span class="header-section-number">4.1.4</span> <strong>Interpreting estimated regression coefficients</strong></h3>
<section id="interpreting-estimated-regression-coefficients" class="level3" data-number="4.2.3">
<h3 data-number="4.2.3" class="anchored" data-anchor-id="interpreting-estimated-regression-coefficients"><span class="header-section-number">4.2.3</span> <strong>Interpreting estimated regression coefficients</strong></h3>
<ul>
<li><p>The interpretation of coefficients (B) and odds ratios (Exp(B)) for the independent variables differs from that in a linear regression setting.</p></li>
<li><p>Interpreting the regression coefficients.</p></li>
Expand All @@ -735,8 +735,8 @@ <h3 data-number="4.1.4" class="anchored" data-anchor-id="interpreting-estimated-
<p><strong>Q5.</strong> Could you identify significant factors of commuting over long distances?</p>
</div>
</section>
<section id="prediction-using-fitted-regression-model" class="level3" data-number="4.1.5">
<h3 data-number="4.1.5" class="anchored" data-anchor-id="prediction-using-fitted-regression-model"><span class="header-section-number">4.1.5</span> Prediction using fitted regression model</h3>
<section id="prediction-using-fitted-regression-model" class="level3" data-number="4.2.4">
<h3 data-number="4.2.4" class="anchored" data-anchor-id="prediction-using-fitted-regression-model"><span class="header-section-number">4.2.4</span> Prediction using fitted regression model</h3>
<p>Relating to this week’s lecture, the log odds of the person who is will to long-distance commuting is equal to:</p>
<p>Log odds of long-distance commuting = 0.188 + 0.693 * sexFemale + 0.679 * nssec3 + 0.357*nssec4 + 3.409*nssec5 + 0.249*nssec6 + 0.237*nssec7 + 0.226*nssec8</p>
<p>By using R, you can create the object you would like to predict. Here we created three person, see whether you can interpret their gender and socio-economic classification?</p>
Expand All @@ -754,8 +754,8 @@ <h3 data-number="4.1.5" class="anchored" data-anchor-id="prediction-using-fitted
<p>So let us look at these three people. The first one, for a male who classified as Semi-routine occupation in NSSEC, the probability of he travel over 60km to work is only 4.26%. For the second one, a female who is in Lower managerial and professional occupation, the probability of long-distance commuting is 8.11%. Now you know the prediction outcomes for our last person.</p>
</section>
</section>
<section id="extension-activities" class="level2" data-number="4.2">
<h2 data-number="4.2" class="anchored" data-anchor-id="extension-activities"><span class="header-section-number">4.2</span> <strong>Extension activities</strong></h2>
<section id="extension-activities" class="level2" data-number="4.3">
<h2 data-number="4.3" class="anchored" data-anchor-id="extension-activities"><span class="header-section-number">4.3</span> <strong>Extension activities</strong></h2>
<p>The extension activities are designed to get yourself prepared for the Assignment 2 in progress. For this week, try whether you can:</p>
<ul>
<li><p>Select a regression strategy and explain why a linear or logistic model is appropriate</p></li>
Expand Down
Loading

0 comments on commit aeb51a2

Please sign in to comment.