diff --git a/exercises/Chapter4.ipynb b/exercises/Chapter4.ipynb index 598dc88..ae85bc8 100644 --- a/exercises/Chapter4.ipynb +++ b/exercises/Chapter4.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Chapter 4 Exercises 1-5" + "# Chapter 4 Exercises" ] }, { @@ -13,15 +13,17 @@ "metadata": {}, "outputs": [], "source": [ + "import arviz as az\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", - "from scipy import stats\n", - "from scipy.special import expit as logistic\n", - "import arviz as az\n", "import pymc3 as pm\n", "import seaborn as sns\n", "import theano.tensor as tt\n", + "\n", + "from scipy import stats\n", + "from scipy.special import expit as logistic\n", + "\n", "np.random.seed(123)" ] }, @@ -30,7 +32,9 @@ "metadata": {}, "source": [ "## Exercise 1\n", - "***" + "***\n", + "\n", + "*Re-run the first model using the petal length and then petal width variables. What are the main differences in the results? How wide or narrow is the 95% HPD interval in each case?*" ] }, { @@ -43,7 +47,7 @@ "df = iris.query(\"species == ('setosa', 'versicolor')\")\n", "y_0 = pd.Categorical(df['species']).codes\n", "\n", - "varnames = ['α', 'β', 'bd']\n" + "varnames = ['α', 'β', 'bd']" ] }, { @@ -141,6 +145,7 @@ " yl = pm.Bernoulli('yl', p=θ, observed=y_0)\n", "\n", " trace_0 = pm.sample(1000)\n", + " \n", " print(\"Feature {} summary\".format(feature))\n", " print(az.summary(trace_0, varnames, credible_interval=.95))" ] @@ -149,7 +154,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "From the results we can see that bd variable HPD is the smallest with sepal length, and increases with petal_width, and petal_length." + "From the results, we can see that the `bd` variable's HPD is the smallest with sepal length, and increases with petal_width and petal_length." ] }, { @@ -157,7 +162,9 @@ "metadata": {}, "source": [ "## Exercise 2\n", - "***" + "***\n", + "\n", + "*Repeat exercise 1, this time using a Student's t-distribution as a weakly informative prior. Try different values of $\\nu$.*" ] }, { @@ -243,7 +250,7 @@ } ], "source": [ - "for nu in [1,10, 30]:\n", + "for nu in [1, 10, 30]:\n", "\n", " x_0 = df[\"petal_length\"].values\n", " x_c = x_0 - x_0.mean()\n", @@ -261,7 +268,8 @@ " yl = pm.Bernoulli('yl', p=θ, observed=y_0)\n", "\n", " trace_0 = pm.sample(1000)\n", - " print(\"Feature {} nu {} summary\".format(feature, nu))\n", + " \n", + " print(f\"Feature {feature} nu {nu} summary\")\n", " print(az.summary(trace_0, varnames, credible_interval=.95))" ] }, @@ -270,7 +278,11 @@ "metadata": {}, "source": [ "## Exercise 3\n", - "***" + "***\n", + "\n", + "*Go back to the first example, the logistic regression for classifying setosa or versicolor given sepal length. Try to solve the same problem using a simple linear regression model, as we saw in chapter 3. How useful is linear regression compared to logistic regression? Can the result be interpreted as a probability?*\n", + "\n", + "*Tip: check whether the values of $y$ are restricted to the interval [0,1].*" ] }, { @@ -314,7 +326,9 @@ " μ = α + pm.math.dot(x_c, β)\n", "\n", " yl = pm.Normal('yl', mu=μ, sd=sd, observed=y_0)\n", + " \n", " trace_linear = pm.sample(1000)\n", + " \n", " posterior_predictive_linear = pm.sample_posterior_predictive(trace_linear)\n", " print(az.summary(trace_linear, credible_interval=.95))" ] @@ -350,14 +364,14 @@ ], "source": [ "data = az.from_pymc3(trace=trace_linear, posterior_predictive=posterior_predictive_linear)\n", - "az.plot_ppc(data)" + "az.plot_ppc(data);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "From the posterior predictive check this model is not very useful. We are trying to estimate the probability of a species given a sepal_length, but a number of the posterior predictive check values are below 0 and above 1. As is such the result cannot be interpreted as probability." + "From the posterior predictive checks, this model is not very useful. We are trying to estimate the probability of a species given sepal_length, but a number of the posterior predictive check values are below 0 and above 1. As such, the result cannot be interpreted as a probability." ] }, { @@ -365,7 +379,9 @@ "metadata": {}, "source": [ "## Exercise 4\n", - "***" + "***\n", + "\n", + "*In the example from the \"Interpreting the coefficients of a logistic regression\" section, we changed `sepal_length` by 1 unit. Using figure 4.6, corroborate that the value of `log_odds_versicolor_i` corresponds to the value of `probability_versicolor_i`. Do the same for `log_odds_versicolor_f` and `probability_versicolor_f`. Just by noting that `log_odds_versicolor_f` - `log_odds_versicolor_i` is negative, what can you say about the probability? Use figure 4.6 to help you. Is this result clear to you from the definition of log-odds?*" ] }, { @@ -523,7 +539,6 @@ "log_odds_versicolor_i = (summary['mean'] * [1, x_1, x_2]).sum()\n", "probability_versicolor_i = logistic(log_odds_versicolor_i)\n", "\n", - "\n", "log_odds_versicolor_f = (summary['mean'] * [1, x_1, x_2+1]).sum()\n", "probability_versicolor_f = logistic(log_odds_versicolor_f)\n", "\n", @@ -534,9 +549,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The value of -5.19 is consistent across the summary and our \"hand check\". A log odds value of -5.1 means that as $\\beta_1$ increases, the probability that the species is versicolor decreases, or equivalently as sepal_width increases, the probability the flower is versicolor decreases.\n", + "The value of -5.22 is consistent with the summary and our \"hand check\". A log odds value of -5.22 means that as $x_2$ increases by one unit, the probability that the species is versicolor decreases. Or, equivalently, as sepal width increases, the probability that the flower is versicolor decreases.\n", "\n", - "We can verify with a quick plot" + "We can verify this with a quick plot:" ] }, { @@ -569,22 +584,24 @@ ], "source": [ "colors = df[\"species\"].replace({'setosa':\"blue\", 'versicolor':\"green\"})\n", - "df.plot(kind=\"scatter\", x=\"sepal_length\", y=\"sepal_width\", c=colors)" + "df.plot(kind=\"scatter\", x=\"sepal_length\", y=\"sepal_width\", c=colors);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "As sepal with increases from 3 to 4 we get farther from the green dots, reducing the probability that the flower were seeing is of the virginica species." + "We see that, as sepal width increases from 3 to 4, we get further away from the green dots, reducing the probability that the flower we're seeing is of the versicolor species." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "# Question 5\n", - "***" + "## Question 5\n", + "***\n", + "\n", + "*Use the same example from the previous exercise. For `model_1`, check how much the log-odds change when increasing `sepal_length` from 5.5 to 6.5 (spoiler: it should be 4.66). How much does the probability change? How does this increase compared to when we increase `sepal_length` from 4.5 to 5.5?*" ] }, { @@ -613,14 +630,14 @@ "x_2 = 3 # sepal_width \n", "\n", "for i in (0,1):\n", - " log_odds_versicolor_i = (summary['mean'] * [1, x_1+i, x_2]).sum()\n", + " log_odds_versicolor_i = (summary['mean'] * [1, x_1 + i, x_2]).sum()\n", " probability_versicolor_i = logistic(log_odds_versicolor_i)\n", "\n", "\n", - " log_odds_versicolor_f = (summary['mean'] * [1, x_1+i+1, x_2]).sum()\n", + " log_odds_versicolor_f = (summary['mean'] * [1, x_1 + i + 1, x_2]).sum()\n", " probability_versicolor_f = logistic(log_odds_versicolor_f)\n", "\n", - " print(f\"\"\"sepal_length_i {x_1+i}, sepal_length_f {x_1+i+1}\n", + " print(f\"\"\"sepal_length_i {x_1 + i}, sepal_length_f {x_1 + i + 1}\n", " Log Odds Change {log_odds_versicolor_f - log_odds_versicolor_i}\n", " Probability Change {probability_versicolor_f - probability_versicolor_i}\n", " \"\"\")" @@ -630,7 +647,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "From from the calculation above we see that while the coefficient stays constant, as it should in linear regression, the probability change is not as large from 5.5 to 6.5 as it is from 4.5 to 5.5. Looking at the graphic this intuitively makes sense as well. In the region of sepal length of 4.5 to 5.5, it's hard to differentiate among the species. However when moving from a sepal length of 5.5 to 6.5, only the virginica species has sepal lengths in that range." + "From the calculation above we see that while the log-odds change stays constant, as it should in linear regression, the probability change is not as large from 5.5 to 6.5 as it is from 4.5 to 5.5. Looking at the graphic this intuitively makes sense as well. When sepal length is at 4.5, the chance that the species is versicolor is very small. When sepal length jumps to 5.5, this probability gets a lot bigger. This means that subsequently going from 5.5 to 6.5 still increases the probability of versicolor, but not as much - because, well, at 5.5 there is already a good chance that the species we're seeing is versicolor." ] }, { @@ -638,7 +655,9 @@ "metadata": {}, "source": [ "## Exercise 6\n", - "***" + "***\n", + "\n", + "*In the example for dealing with unbalanced data, change `df = df[45:]` to `df = df[22:78]`. This will keep roughly the same number of data points, but now the classes will be balanced. Compare the new result with the previous ones. Which one is more similar to the example using the complete dataset?*" ] }, { @@ -650,7 +669,8 @@ "iris = pd.read_csv('../code/data/iris.csv')\n", "\n", "df = iris.query(\"species == ('setosa', 'versicolor')\") \n", - "df = df[22:78] \n", + "df = df[22:78]\n", + "\n", "y_3 = pd.Categorical(df['species']).codes \n", "x_n = ['sepal_length', 'sepal_width'] \n", "x_3 = df[x_n].values\n", @@ -740,7 +760,8 @@ ], "source": [ "idx = np.argsort(x_3[:,0]) \n", - "bd = trace_3['bd'].mean(0)[idx] \n", + "bd = trace_3['bd'].mean(0)[idx]\n", + "\n", "plt.scatter(x_3[:,0], x_3[:,1], c= [f'C{x}' for x in y_3]) \n", "plt.plot(x_3[:,0][idx], bd, color='k')\n", "\n", @@ -754,7 +775,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The decision boundary in this plot looks more like the unfiltered dataset as the blue datapoints are largely not contained in the 95% HPD. This indicates that the balanced model, even with less data points, is better able to distinguish between classes." + "The decision boundary in this plot looks more like the unfiltered dataset as the blue data points are largely not contained in the boundary decision's 95% HPD. This indicates that the balanced model, even with less data points, is better able to distinguish between classes." ] }, { @@ -762,14 +783,11 @@ "metadata": {}, "source": [ "## Exercise 7\n", - "***" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Lets run the model to have data points for a discussion" + "***\n", + "\n", + "*Suppose instead of a softmax regression we use a simple linear model by coding `setosa = 0`, `versicolor = 1` and `virginica = 2`. Under the simple linear regression model, what will happen if we switch the coding? Will we get the same or different results?*\n", + "\n", + "Lets run the model to have data points for a discussion:" ] }, { @@ -798,7 +816,7 @@ "\n", "with pm.Model() as model_s:\n", " α = pm.Normal('α', mu=0, sd=5, shape=3)\n", - " β = pm.Normal('β', mu=0, sd=5, shape=(4,3))\n", + " β = pm.Normal('β', mu=0, sd=5, shape=(4, 3))\n", " μ = pm.Deterministic('μ', α + pm.math.dot(x_s, β))\n", " \n", " θ = tt.nnet.softmax(μ)\n", @@ -856,7 +874,7 @@ "metadata": {}, "source": [ "#### Conceptual Understanding\n", - "Note that the shape of the trace. The dimensions should read as follows. We have 4000 estimations of the 3 softmax class values for each of the 150 rows in the dataset." + "Note the shape of the trace. The dimensions should read as follows: we have 4000 estimations of the 3 softmax class values for each of the 150 rows in the dataset." ] }, { @@ -864,7 +882,7 @@ "metadata": {}, "source": [ "#### Discussion\n", - "If we changed a softmax model to a linear model regression model a couple things would change. First would be the interpretation of the final output. A softmax prediction estimates the probability of each class, whereas a linear regression would just provide one number as an estimate for the class. The other problem is that a liner regression would output a continous value across all real numbers, and how to define when one class starts and another ends is unclear." + "If we changed the softmax model to a linear regression model a couple things would change. First, the interpretation of the final output would be different. A softmax prediction estimates the probability of each class, whereas a linear regression would just provide one number as an estimate for the class. The other problem is that a linear regression would output continous values across all real numbers, and how to define when one class starts and another ends is unclear." ] }, { @@ -872,7 +890,9 @@ "metadata": {}, "source": [ "## Exercise 8\n", - "***" + "***\n", + "\n", + "*Compare the likelihood of the logistic model versus the likelihood of the LDA model. Use the `sample_posterior_predictive` function to generate predicted data and compare the types of data you get for both cases. Be sure you understand the difference between the types of data the model predicts.*" ] }, { @@ -923,7 +943,8 @@ " θ = pm.Deterministic(\"θ\", pm.math.sigmoid(μ))\n", " bd = pm.Deterministic(\"bd\", -α / β )\n", " \n", - " y1= pm.Bernoulli(\"y1\", p=θ, observed = y_3)\n", + " y1= pm.Bernoulli(\"y1\", p=θ, observed=y_3)\n", + " \n", " trace_logistic = pm.sample(size=2000)\n", " ppc_logistic = pm.sample_posterior_predictive(trace_logistic)" ] @@ -982,7 +1003,7 @@ "Setosa_{sepal\\_length} \\text{~} Normal(\\mu_1, \\sigma)\n", "\\end{eqnarray}\n", "\n", - "In the logistic regression we are not estimating the properties of the sepal length. Were are merely fitting parameters of the inverse link function. In the LDA model we are estimating the sepal length distributions directly." + "In the logistic regression we are not estimating the properties of the sepal length. We are merely fitting parameters of the inverse link function. In the LDA model we are estimating the sepal length distributions directly." ] }, { @@ -1042,7 +1063,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "When comparing the posterior predictive it can be seen that the logistic model is binary, estimating either 0 or 1, versus the LDA model which has real numbers that generally look like sepal lengths. This follows our understandings of the models, the discriminative model can only make predictions as to which class a particular sepal length belongs to, whereas the LDA model can make predictions about the sepal lengths directly." + "When comparing the posterior predictive, it can be seen that the logistic model is binary, estimating either 0 or 1, while the LDA model has real numbers that generally look like sepal lengths. This follows our understandings of the models: the logistic regression makes predictions as to which class a particular sepal length belongs to, whereas the LDA model makes predictions about the sepal lengths directly." ] }, { @@ -1050,7 +1071,9 @@ "metadata": {}, "source": [ "## Exercise 9\n", - "***" + "***\n", + "\n", + "*Using the fish data, extend the `ZIP_reg` model to include the persons variable as part of a linear model. Include this variable to model the number of extra zeros. You should get a model that includes two linear models: one connecting the number of children and the presence/absence of a camper to the Poisson rate (as in the example we saw), and another connecting the number of persons to the $\\psi$ variable. For the second case, you will need a logistic inverse link!*" ] }, { @@ -1108,7 +1131,9 @@ "metadata": {}, "source": [ "## Exercise 10\n", - "***" + "***\n", + "\n", + "*Use the data for the robust logistic example to feed a non-robust logistic regression model and to check that the outliers actually affected the results. You may want to add or remove outliers to better understand the effect of the estimation on a logistic regression and the robustness of the model introduced in this chapter.*" ] }, { @@ -1132,12 +1157,14 @@ "source": [ "iris = sns.load_dataset(\"iris\") \n", "df = iris.query(\"species == ('setosa', 'versicolor')\") \n", + "\n", "y_0 = pd.Categorical(df['species']).codes \n", "x_n = 'sepal_length' \n", "x_0 = df[x_n].values \n", "y_0 = np.concatenate((y_0, np.ones(6, dtype=int))) \n", "x_0 = np.concatenate((x_0, [4.2, 4.5, 4.0, 4.3, 4.2, 4.4])) \n", "x_c = x_0 - x_0.mean() \n", + "\n", "plt.plot(x_c, y_0, 'o', color='k');" ] }, @@ -1145,7 +1172,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's take the robust logistic regression from the chapter and make it non robust" + "Let's take the robust logistic regression from the chapter and make it non robust:" ] }, { @@ -1174,9 +1201,8 @@ " θ = pm.Deterministic(\"θ\", pm.math.sigmoid(μ))\n", " bd = pm.Deterministic(\"bd\", -α/β)\n", " \n", - " π = pm.Beta(\"π\",1,1)\n", - " \n", - " # Short Circuit Robust regression \n", + " # Short Circuit Robust regression\n", + " # π = pm.Beta(\"π\", 1, 1)\n", " # p = π *.5 + (1-π)*θ\n", " p = θ\n", " \n", @@ -1293,13 +1319,13 @@ "source": [ "theta = trace_rlg['θ'].mean(axis=0)\n", "idx = np.argsort(x_c)\n", - "plt.plot(x_c[idx], theta[idx], color='C2', lw=3);\n", + "\n", "plt.vlines(trace_rlg['bd'].mean(), 0, 1, color='k')\n", "bd_hpd = az.hpd(trace_rlg['bd'])\n", - "\n", "plt.fill_betweenx([0, 1], bd_hpd[0], bd_hpd[1], color='k', alpha=0.5)\n", "\n", "plt.scatter(x_c, np.random.normal(y_0, 0.02), marker='.', color=[f'C{x}' for x in y_0])\n", + "plt.plot(x_c[idx], theta[idx], color='C2', lw=3)\n", "theta_hpd = az.hpd(trace_rlg['θ'])[idx]\n", "plt.fill_between(x_c[idx], theta_hpd[:,0], theta_hpd[:,1], color='C2', alpha=0.5)\n", "\n", @@ -1316,15 +1342,29 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Compare this plot to figure 4.13. Note that the HPD for the decision boundary is wider, reflecting the additional uncertainty. This is also reflected in the slope which is more gradual. This is reflected both in the plot, but also the beta parameter (15.77 for robust model versus 2.38 for the non robust model)" + "Compare this plot to figure 4.13. Note that the HPD for the decision boundary is wider, reflecting the additional uncertainty. This is also reflected in the slope which is more gradual. This is reflected both in the plot, but also the beta parameter (15.77 for robust model versus 2.38 for the non-robust model)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Exercise 11\n", + "***\n", + "\n", + "*Read and run the following notebooks from PyMC3's documentation:*\n", + "\n", + "- [GLM: Linear regression](https://docs.pymc.io/notebooks/GLM-linear.html)\n", + "- [GLM: Robust Linear Regression](https://docs.pymc.io/notebooks/GLM-robust.html)\n", + "- [GLM: Hierarchical Linear Regression](https://docs.pymc.io/notebooks/GLM-hierarchical.html)" ] } ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "stat-rethink-pymc3", "language": "python", - "name": "python3" + "name": "stat-rethink-pymc3" }, "language_info": { "codemirror_mode": { @@ -1336,7 +1376,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.7" + "version": "3.7.3" } }, "nbformat": 4,