From f9d6cee76dda4647ad596d1b12c5070f5d4d01d8 Mon Sep 17 00:00:00 2001 From: jwagner31 Date: Tue, 12 Dec 2023 18:03:41 -0600 Subject: [PATCH] updates --- notebooks/FinalMilestone.ipynb | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/notebooks/FinalMilestone.ipynb b/notebooks/FinalMilestone.ipynb index 2203127..065cb1e 100644 --- a/notebooks/FinalMilestone.ipynb +++ b/notebooks/FinalMilestone.ipynb @@ -118598,7 +118598,7 @@ "id": "30df508c", "metadata": {}, "source": [ - "First, we fit the GAM to the hot data after splitting the data into a train and test set. We use a Poisson distribution, and perform a gridsearch over **lam**, the smoothing parameter, and **n_splines**, the number of splines to use. We predict the number of violent incidents within each gridbox on a given day based on the temperature. The summary statistics are slightly harder to interpret than your standard sklearn model, but the most important number we looked for was the Psuedo R-Squared which told us how well our model explained the variance within the data. " + "First, we fit the GAM to the hot data after splitting the data into a train and test set (takes about 2 minutes total). We use a Poisson distribution, and perform a gridsearch over **lam**, the smoothing parameter, and **n_splines**, the number of splines to use. We predict the number of violent incidents within each gridbox on a given day based on the temperature. The summary statistics are slightly harder to interpret than your standard sklearn model, but the most important number we looked for was the Psuedo R-Squared which told us how well our model explained the variance within the data. For the days where the temperature was greater than 90 degrees, we were able to explain around 60% of the variance which is in line with other studies. Of note is the effective degrees of freedom (DoF), which is quite high at about 38. This suggests our model might be slightly biased. Unlike a regression model, the p-values for each spline do not really mean anything for this model so they are not of interest to us." ] }, { @@ -118668,6 +118668,14 @@ "print(hotgam.summary())" ] }, + { + "cell_type": "markdown", + "id": "fb769a60", + "metadata": {}, + "source": [ + "Now let's do some predictions on the test set for the hot data. The mean absolute percentage error is 0.471, which we will explain below after we predict on the cold data." + ] + }, { "cell_type": "code", "execution_count": 234, @@ -118698,6 +118706,14 @@ "print(mape)\n" ] }, + { + "cell_type": "markdown", + "id": "7baeec6a", + "metadata": {}, + "source": [ + "Now, we follow an identical process for fitting the model to days where the temperature was less than 90 degrees Fahrenheit. This model takes longer to fit, up to 9 minutes. The psuedo R-squared is 0.5389, which is not far off the above model, and indicates a good fit on the data. " + ] + }, { "cell_type": "code", "execution_count": 235,