diff --git a/exercises/Chapter1.ipynb b/exercises/Chapter1.ipynb index 40474ae..dcd9469 100644 --- a/exercises/Chapter1.ipynb +++ b/exercises/Chapter1.ipynb @@ -13,19 +13,13 @@ "metadata": {}, "outputs": [], "source": [ + "import arviz as az\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", + "\n", "from scipy import stats\n", - "import arviz as az" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ + "\n", "az.style.use('arviz-darkgrid')" ] }, @@ -41,19 +35,30 @@ "cell_type": "markdown", "metadata": {}, "source": [ + "We do not know whether the brain really works in a Bayesian way, in an approximate Bayesian fashion, or maybe some evolutionary (more or less) optimized heuristics. Nevertheless, we know that we learn by exposing ourselves to data, examples and exercises - well, you may say that humans never learn, given our record as a species on subjects such as wars or economic systems that prioritize profit and not people's well-being... Anyway, I recommend you do the proposed exercises at the end of each chapter.\n", + "\n", + "*From the following expressions, which one corresponds to the sentence \"the probability of being sunny, given that it is the 9th of July of 1816\"?*\n", + "- p(sunny)\n", + "- p(sunny | July)\n", + "- p(sunny | 9th of July of 1816)\n", + "- p(9th of July of 1816 | sunny)\n", + "- p(sunny, 9th of July of 1816) / p(9th of July of 1816)\n", + "\n", + "\n", "There are two statements that correspond to the *Probability of being sunny given that it is the 9th of July of 1816*\n", "\n", "1. p(sunny | 9th of July of 1816)\n", - "2. p(sunny| 9th of July of 1816) / p(9th of July of 1816)\n", + "2. p(sunny, 9th of July of 1816) / p(9th of July of 1816)\n", "\n", "For the second one recall the product rule (Equation 1.1)\n", "\n", - "$$ p(A,B) = p(A|B)p(B)$$\n", + "$$ p(A,B) = p(A|B)p(B) $$\n", "\n", "A rearrangement of this formula yields\n", "\n", - "$$ p(A|B) = \\frac{p(A,B)}{p(B)}$$\n", - "Replace A and B with \"sunny\" and \"9th of July of 1816\" to get the equivament formulation" + "$$ p(A|B) = \\frac{p(A, B)}{p(B)}$$\n", + "\n", + "Replace A and B with \"sunny\" and \"9th of July of 1816\" to get the equivament formulation." ] }, { @@ -68,22 +73,25 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's assume there are 6 billion humans in the galaxy and there is only 1 Pope, Pope Francis, at time of writing. If a human is randomly picked the chances of that human being the pope are 1 in 6 billion. In mathematical notation\n", + "*Show that the probability of choosing a human at random and picking the Pope is not the same as the probability of the Pope being human.*\n", + "\n", + "Let's assume there are 6 billion humans in the galaxy and there is only 1 Pope, Pope Francis, at the time of this writing. If a human is randomly picked the chances of that human being the pope are 1 in 6 billion. In mathematical notation\n", "\n", "$$ p(Pope | human) = \\frac{1}{6,000,000} $$\n", "\n", "Additionally I am very confident that the Pope is human, so much so that I make this statement. *Given a pope, I am 100% certain they are human*. \n", "Written in math\n", - "$$ p(human | Pope) = 1$$\n", + "$$ p(human | Pope) = 1 $$\n", "\n", + "*In the animated series Futurama, the (space) Pope is a reptile. How does this change your previous calculations?*\n", "\n", - "In animated show Futurama the human pope has been replaced by a reptile pope. As is such.\n", + "Ok then:\n", "\n", "$$ p(Pope | human) = 0 $$\n", "\n", "And \n", "\n", - "$$ p(human | Pope) = 0$$" + "$$ p(human | Pope) = 0 $$" ] }, { @@ -98,7 +106,15 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The priors in thie model are\n", + "*In the following definition of a probabilistic model, identify the prior and the likelihood:*\n", + "\n", + "\\begin{eqnarray}\n", + "y_i \\text{~} Normal(\\mu, \\sigma) \\newline\n", + "\\mu \\text{~} Normal(0,10) \\newline\n", + "\\sigma \\text{~} HalfNormal(25) \n", + "\\end{eqnarray}\n", + "\n", + "The priors in the model are\n", "\n", "\\begin{eqnarray}\n", "\\mu \\text{~} Normal(0,10) \\newline\n", @@ -106,7 +122,7 @@ "\\end{eqnarray}\n", "\n", "The likelihood in our model is \n", - "$$ y_i \\text{~} Normal(\\mu, \\sigma)$$" + "$$ y_i \\text{~} Normal(\\mu, \\sigma) $$" ] }, { @@ -121,9 +137,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In the previous exquestion there are two parameters in the posterior $\\mu$ and $\\sigma$. In our coin flipping model we had one parameter $\\theta$.\n", + "*In the previous model, how many parameters will the posterior have? Compare it with the model for the coin-flipping problem.*\n", + "\n", + "In the previous question there are two parameters in the posterior, $\\mu$ and $\\sigma$. \n", "\n", - "It may seem confusing that we also had *alpha* and *beta* as well, but remember, these were not parameters we were trying ot estimate, or in other words we really don't care about alpha and beta, they were just values for our prior distribution. What we really wanted was $\\theta$ to determine the fairness of the coin." + "In our coin flipping model we had one parameter, $\\theta$. It may seem confusing that we had $\\alpha$ and $\\beta$ as well, but remember, these were not parameters we were trying ot estimate. In other words we don't really care about $\\alpha$ and $\\beta$ - they were just values for our prior distribution. What we really wanted was $\\theta$, to determine the fairness of the coin." ] }, { @@ -138,6 +156,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ + "*Write Bayes' theorem for the model in question 3.*\n", + "\n", "$$ p(\\mu, \\sigma | y) = \\frac{p(y| \\mu, \\sigma)p(\\mu)p(\\sigma)}{p(y)} $$" ] }, @@ -153,32 +173,34 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Formalizing some of the statements into mathematical notatoin\n", + "*Let's suppose that we have two coins. When we toss the first coin, half the time it lands on tails and the other half on heads. The other coin is a loaded coin that always lands on heads. If we take one of the coins at random and get heads, what is the probability that this coin is the unfair one?*\n", + "\n", + "Formalizing some of the statements into mathematical notation:\n", "\n", - "The probability of picking a coin at random, and getting either the biased or the fair coin is the same \n", + "The probability of picking a coin at random, and getting either the biased or fair coin is the same:\n", "\n", "$$p(Biased) = p(Fair) = .5$$\n", "\n", - "The probabiity of getting a heads with a biased coin is 1,\n", + "The probability of getting heads with the biased coin is 1,\n", "$$p(Heads | Biased) = 1$$\n", "\n", - "The probability of getting a heads with the fair coin is .5\n", + "The probability of getting heads with the fair coin is .5\n", "$$p(Heads | Fair) = .5$$\n", "\n", - "Lastly remember that after picking a coin at **random** we have observed a heads. Therefore we can use Bayes rule to calculate the probability we picked the biased coin\n", + "Lastly, remember that after picking a coin at *random*, we observed heads. Therefore we can use Bayes rule to calculate the probability that we picked the biased coin:\n", "\n", "$$ p(Biased | Heads) = \\frac{p(Heads | Biased) p(Biased)}{p(Heads)} $$\n", "\n", - "To solve this by hand we need to rewrite the denominator using the *Rule of Total Probability*\n", + "To solve this by hand we need to rewrite the denominator using the *Rule of Total Probability*:\n", "\n", "$$ p(Biased | Heads) = \\frac{p(Heads | Biased) p(Biased)}{p(Heads|Fair)*p(Fair) + p(Heads|Biased)*p(Biased)} $$\n", "\n", - "We can use Python to do the math for us" + "We can use Python to do the math for us:" ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 1, "metadata": {}, "outputs": [ { @@ -187,7 +209,7 @@ "0.6666666666666666" ] }, - "execution_count": 3, + "execution_count": 1, "metadata": {}, "output_type": "execute_result" } @@ -200,8 +222,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Question 7\n", - "***" + "## Questions 7 & 8\n", + "***\n", + "\n", + "*Modify the code that generated Figure 1.5, in order to add a dotted vertical line showing the observed rate of $\\frac{Heads}{Number-of-tosses} $. Compare the location of this line to the mode of the posteriors in each subplot.*\n", + "\n", + "*Try re-running this code using other priors (`beta_params`) and other data (`n_trials` and `data`).*" ] }, { @@ -260,8 +286,45 @@ " plt.yticks([])\n", " \n", " \n", - "plt.tight_layout()" + "plt.tight_layout();" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Question 9\n", + "***\n", + "\n", + "*Go to the chapter's notebook and explore different parameters for the Gaussian, binomial and beta plots (figures 1.1, 1.3 and 1.4 from the chapter). Alternatively, you may want to plot a single distribution instead of a grid of distributions.*" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Question 10\n", + "***\n", + "\n", + "*Read about [Cromwell's rule](https://en.wikipedia.org/wiki/Cromwell%27s_rule) on Wikipedia.*" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Question 11\n", + "***\n", + "\n", + "*Read about [probabilities and the Dutch book](https://en.wikipedia.org/wiki/Dutch_book) on Wikipedia.*" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { @@ -280,7 +343,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.8" + "version": "3.7.3" } }, "nbformat": 4, diff --git a/exercises/Chapter2.ipynb b/exercises/Chapter2.ipynb index 3086bc4..922068e 100644 --- a/exercises/Chapter2.ipynb +++ b/exercises/Chapter2.ipynb @@ -11,22 +11,16 @@ "cell_type": "code", "execution_count": 1, "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "WARNING (theano.configdefaults): install mkl with `conda install mkl-service`: No module named 'mkl'\n" - ] - } - ], + "outputs": [], "source": [ + "import arviz as az\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", - "from scipy import stats\n", - "import arviz as az\n", "import pymc3 as pm\n", + "\n", + "from scipy import stats\n", + "\n", "np.random.seed(123)" ] }, @@ -35,7 +29,9 @@ "metadata": {}, "source": [ "## Question 1\n", - "***" + "***\n", + "\n", + "*Using PyMC3, change the parameters of the prior beta distribution in `our_first_model` to match those of the previous chapter. Compare the results to the previous chapter. Replace the beta distribution with a uniform one in the interval [0,1]. Are the results equivalent to $Beta(\\alpha=1, \\beta=1)$? Is the sampling slower, faster or the same? What about using a larger interval, such as [-1,2]? Does the model run? What errors do you get?*" ] }, { @@ -120,7 +116,7 @@ } ], "source": [ - "az.plot_trace(trace_baseline)" + "az.plot_trace(trace_baseline);" ] }, { @@ -192,7 +188,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Match prior for previous example" + "### Match prior for previous example" ] }, { @@ -220,7 +216,7 @@ " θ = pm.Beta(\"θ\", alpha=1,beta=1)\n", " # likelihood\n", " y = pm.Bernoulli('y', p=θ, observed=data)\n", - " trace = pm.sample(1000, random_seed=123)\n" + " trace = pm.sample(1000, random_seed=123)" ] }, { @@ -287,7 +283,7 @@ } ], "source": [ - "az.plot_trace(trace_uniform_1_1)" + "az.plot_trace(trace_uniform_1_1);" ] }, { @@ -371,14 +367,14 @@ "metadata": {}, "source": [ "### Discussion\n", - "Both priors are produce identical results. This is to be expected as a $Beta(1,1)$ prior is exactly the same as a $Uniform(0,1)$ prior. I encourage you prove it to yourself by plotting both. Furthermore the model sampling time is identical as well." + "Both priors produce identical results. This is to be expected as a $Beta(1,1)$ prior is exactly the same as a $Uniform(0,1)$ prior. I encourage you to prove it to yourself by plotting both. Furthermore, the models sampling times are identical as well." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Replace the beta distribution with a uniform [-1,2]" + "### Replace the beta distribution with a uniform [-1,2]" ] }, { @@ -442,7 +438,7 @@ } ], "source": [ - "az.plot_trace(trace_uniform_prior_minus_1_2)" + "az.plot_trace(trace_uniform_prior_minus_1_2);" ] }, { @@ -526,17 +522,27 @@ "metadata": {}, "source": [ "### Discussion\n", - "Looking at the mean, sd, and HPD values both results look identical which is great. However there three items that raise concerns. We will cover these all in depth in Chapter 8 but will note them here as part of the exercise.\n", + "Looking at the mean, sd, and HPD values, both results look identical, which is great. However there are three items that raise concerns. We will cover these in depth in Chapter 8 but will note them here as part of the exercise.\n", "\n", - "The first is that there's two new warning messages that look something like _There were 1179 divergences after tuning. Increase `target_accept` or reparameterize._ Here the sampler is warning us that something is going wrong with sampling.\n", + "The first is that there are two new warning messages that look something like `There were 1179 divergences after tuning. Increase target_accept or reparameterize`. Here, the sampler is warning us that something is going wrong with sampling.\n", "\n", "The second is the black bars at the bottom of the traceplot. These also indicate divergences.\n", "\n", - "Lastly when looking at eff_n, which stands for effective number of samples, the $Uniform(-1,2)$ prior has a much lower number than the $Beta(1,1)$, meaning that the number of useful samples was much less than the total number of samples drawn\n", + "Lastly, when looking at eff_n, which stands for effective number of samples, the $Uniform(-1,2)$ prior has a much lower number than the $Beta(1,1)$, meaning that the number of useful samples was much less than the total number of samples drawn.\n", + "\n", + "This makes sense when asking the question \"Can a probabiity be less than 0 or more than 1?\". The answer is no, but in our prior we're asking the sampler to \"test\" prior probability values less than 0 and greater than 1, and when it does so the likelihood function is unable to compute a value.\n", "\n", - "This makes sense when asking the question _Can a probabiity be less than 0 or more than 1?_ The answer is no, but in our prior we're asking the sampler to \"test\" prior probability values less 0 and greater than 1, and when it does so the likelihood function is unable to compute a value.\n", + "Again, we'll cover MCMC diagnostics (and what to do) in much more detail in Chapter 8, but for now it is sufficient to be able to recognize these warnings when they appear." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Question 2\n", + "***\n", "\n", - "Again we'll cover MCMC diagnostics, and what to do, in much more detail in Chapter 8, but for now it is sufficient to be able to recognize these warnings when they appear." + "*Read about the [coal mining disaster model](https://docs.pymc.io/notebooks/getting_started.html#Case-study-2:-Coal-mining-disasters) that is part of the PyMC3 documentation. Try to implement and run this model by yourself.*" ] }, { @@ -544,7 +550,9 @@ "metadata": {}, "source": [ "## Question 3\n", - "***" + "***\n", + "\n", + "*Modify `model_g`: change the prior for the mean to a Gaussian distribution centered at the empirical mean, and play with a couple of reasonable values for the standard deviation of this prior. How robust/sensitive are the inferences to these changes? What do you think of using a Gaussian, which is an unbounded distribution (goes from $-\\infty$ to $+\\infty$), to model bouded data such as this? Remember that we said it is not possible to get values below 0 or above 100.*" ] }, { @@ -589,7 +597,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Next let's create a model with a prior parametrized by $\\mu \\text{~} N(53, \\sigma)$. Since we want to test the effect of various $\\sigma$ values. We'll assume that values of [1.5, 3, 5] are reasonable priors parameters. We'll also add one unreasonable $sigma$ of 100 for comparison.\n", + "Next, let's create a model with a prior parametrized by $\\mu \\text{~} N(53, \\sigma)$. Since we want to test the effect of various $\\sigma$ values, we'll assume that values of [1.5, 3.0, 5.0] are reasonable priors parameters. We'll also add one unreasonable $sigma$ of 1000 for comparison.\n", "\n", "A benefit of Python is that we can create a loop to run three models and compare results." ] @@ -633,7 +641,7 @@ "for sd_prior in sd_priors:\n", " with pm.Model() as model_g:\n", " # Modified prior to Gaussian\n", - " μ = pm.Normal('μ_prior_{}'.format(sd_prior), mu=empirical_mean, sd=sd_prior)\n", + " μ = pm.Normal(f'μ_prior_{sd_prior}', mu=empirical_mean, sd=sd_prior)\n", " σ = pm.HalfNormal('σ', sd=10)\n", " y = pm.Normal('y', mu=μ, sd=σ, observed=data)\n", " trace_g = pm.sample(1000)\n", @@ -785,24 +793,26 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Looking at the summaries the combination of model and inference technique seem quite robust to the changes. Even with a prior that is 300x larger than the empirical prior the posterior values converge to approximately the same result. Computationally there seems to be very little difference.\n", + "Looking at the summaries, the combination of model and inference technique seem quite robust to the changes. Even with a prior that is 300x larger than the empirical prior, the posterior values converge to approximately the same result. Computationally there seems to be very little difference.\n", "\n", - "Logically however there could be some question of the choice of an unbounded prior. Since it is not possible to values that below 0 or above 100, it doesn't make practical sense to have a prior that exists for those values. Luckily though with modern inference methods such as NUTS, the samples can \"bypass\" questionable priors and still get a good approximation of the posterior." + "Logically however, there could be some question about the choice of an unbounded prior. Since it is not possible to get values below 0 or above 100, it doesn't make practical sense to have a prior that exists for those values. Luckily though, with modern inference methods such as NUTS, the samples can \"bypass\" questionable priors and still get a good approximation of the posterior." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Question 4\n", - "***" + "## Question 4\n", + "***\n", + "\n", + "*Using the data in chemical_shifts.csv, compute the empirical mean and the standard deviation with and without outliers. Repeat the exercise by adding more outliers. Compare those results to the Bayesian estimations using the Gaussian and Student's t distributions from the chapter.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Let's first compute the mean and standard deviation without removing any data" + "Let's first compute the mean and standard deviation without removing any data:" ] }, { @@ -831,16 +841,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Then let's let's identify outliers by using the _2x standard deviation_ methodology" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [], - "source": [ - "empirical_mean,empirical_std = np.mean(data), np.std(data)" + "Then let's let's identify outliers by using the _2x standard deviation_ methodology:" ] }, { @@ -868,7 +869,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Removing those two values let recompute the mean and standard deviation" + "Removing those two values, let's recompute the mean and standard deviation:" ] }, { @@ -898,14 +899,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Notice that the mean has dropped from 53.49 to 52.9, and the standard deviatipn has dropped from 3.42 to 2.19. Logically this makes sense as the data is \"less spread out\" when we don't include outliers." + "Notice that the mean has dropped from 53.49 to 52.95, and the standard deviation has dropped from 3.42 to 2.19. Logically this makes sense as the data is \"less spread out\" when we don't include outliers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Let's repeat the exercise but add more outliers. We can do this by repeating the previously identified outliers a couple more times" + "Let's repeat the exercise but add more outliers. We can do this by repeating the previously identified outliers a couple more times:" ] }, { @@ -966,24 +967,17 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The mean and the standard deviation both go up in this case. This intuitively makes sense because the the distribution needs to \"stretch\" to include these additional data points that are farther away from mean." + "The mean and standard deviation both go up in this case. This intuitively makes sense because the distribution needs to \"stretch\" to include these additional data points that are farther away from the mean." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "# Question 5\n", - "***" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [], - "source": [ - "tips = pd.read_csv('../code/data/tips.csv')" + "## Question 5\n", + "***\n", + "\n", + "*Modify the tips example to make it robust to outliers. Try with one shared $\\nu$ for all groups and also with one $\\nu$ per group. Run posterior predictive checks to assess these three models.*" ] }, { @@ -992,6 +986,7 @@ "metadata": {}, "outputs": [], "source": [ + "tips = pd.read_csv('../code/data/tips.csv')\n", "tip = tips['tip'].values\n", "idx = pd.Categorical(tips['day'],\n", " categories=['Thur', 'Fri', 'Sat', 'Sun']).codes\n", @@ -1025,10 +1020,11 @@ ], "source": [ "with pm.Model() as comparing_groups:\n", - " μ = pm.Normal('μ', mu=0, sd=10, shape=groups)\n", - " σ = pm.HalfNormal('σ', sd=10, shape=groups)\n", - " ν = pm.Exponential('ν', 1/30)\n", + " μ = pm.Normal('μ', mu=0., sd=10., shape=groups)\n", + " σ = pm.HalfNormal('σ', sd=10., shape=groups)\n", + " \n", " y = pm.Normal('y', mu=μ[idx], sd=σ[idx], observed=tip)\n", + " \n", " trace_normal = pm.sample(5000)\n", " y_pred_normal = pm.sample_posterior_predictive(trace_normal, 100)" ] @@ -1064,7 +1060,7 @@ "source": [ "tips_normal = az.from_pymc3(trace=trace_normal, posterior_predictive=y_pred_normal)\n", "axes = az.plot_ppc(tips_normal, figsize=(12,8), mean=False)\n", - "axes[0].set_title(\"Posterior Predictive Check for Normal Tips Model\")" + "axes[0].set_title(\"Posterior Predictive Check for Normal Tips Model\");" ] }, { @@ -1097,6 +1093,7 @@ " μ = pm.Normal('μ', mu=0, sd=10, shape=groups)\n", " σ = pm.HalfNormal('σ', sd=10, shape=groups)\n", " ν = pm.Exponential('ν', 1/30)\n", + " \n", " y = pm.StudentT('y', mu=μ[idx], sd=σ[idx], nu=ν, observed=tip)\n", " \n", " trace_nu_shared = pm.sample(5000)\n", @@ -1135,14 +1132,14 @@ "tips_nu_shared = az.from_pymc3(trace=trace_nu_shared, posterior_predictive=y_pred_nu_shared)\n", "axes = az.plot_ppc(tips_nu_shared, figsize=(12,8), mean=False)\n", "axes[0].set_title(\"Posterior Predictive Check for Student T Shared ν Tips Model\")\n", - "axes[0].set_xlim(-4.5,10)" + "axes[0].set_xlim(-4.5,10);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### $\\nu$ per group" + "### One $\\nu$ per group" ] }, { @@ -1172,6 +1169,7 @@ " μ = pm.Normal('μ', mu=0, sd=10, shape=groups)\n", " σ = pm.HalfNormal('σ', sd=10, shape=groups)\n", " ν = pm.Exponential('ν', 1/30, shape=groups)\n", + " \n", " y = pm.StudentT('y', mu=μ[idx], sd=σ[idx], nu=ν[idx], observed=tip)\n", " \n", " trace_nu_per_group = pm.sample(5000)\n", @@ -1210,31 +1208,24 @@ "tips_nu_per_group = az.from_pymc3(trace=trace_nu_per_group, posterior_predictive=y_pred_nu_per_group)\n", "axes = az.plot_ppc(tips_nu_per_group, figsize=(12,8), mean=False)\n", "axes[0].set_title(\"Posterior Predictive Check for Student T ν per group\")\n", - "axes[0].set_xlim(-10,10)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Question 6\n", - "***" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To calculate Cohen's D using 3 steps\n", - "1. Fitting a distribution of parameters to our data\n", - "2. Generating a posterior predictive distribution of our groups\n", - "3. Picking data points at random from each group and comparing from the datapoint in group" + "axes[0].set_xlim(-10,10);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ + "## Question 6\n", + "***\n", + "\n", + "*Compute the probability of superiority directly from the posterior (without computing Cohen's d first). You can use the `pm.sample_posterior_predictive()` function to take a sample from each group. Is it really different from the calculation assuming normality? Can you explain the result?*\n", + "\n", + "The goal is to calculate Cohen's D numerically. We'll do that in 3 steps:\n", + "\n", + "1. Fit a distribution of parameters to our data\n", + "2. Generate posterior predictive distributions for our groups\n", + "3. Pick data points at random from each group and compare the datapoints in each group\n", + "\n", "### Step 1: Estimate model parameters per group" ] }, @@ -1244,7 +1235,6 @@ "metadata": {}, "outputs": [], "source": [ - "\n", "tips = pd.read_csv('../code/data/tips.csv')\n", "tip = tips['tip'].values\n", "idx = pd.Categorical(tips['day'],\n", @@ -1274,6 +1264,7 @@ "with pm.Model() as comparing_groups:\n", " μ = pm.Normal('μ', mu=0, sd=10, shape=groups)\n", " σ = pm.HalfNormal('σ', sd=10, shape=groups)\n", + " \n", " y = pm.Normal('y', mu=μ[idx], sd=σ[idx], observed=tip)\n", " \n", " trace_cg = pm.sample(5000)\n", @@ -1427,10 +1418,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Step 2: Get posterior predictive values per group\n", - "In our case let's see if Sunday is superior to Thursday in regards to tips. \n", + "### Step 2: Get posterior predictive values per group\n", + "In our case, let's see if Sunday is superior to Thursday in regards to tips. \n", "\n", - "In our code recall a couple items\n", + "In our code, recall that:\n", "* We have 244 items\n", "* We encoded the string labels of days into \"codes\" or integers\n", "* Sunday was encoded as 3, Thursday was encoded as 0" @@ -1442,7 +1433,7 @@ "metadata": {}, "outputs": [], "source": [ - "# Get our models predictions for possible tips for Thursday and Sunday\n", + "# Get our model's predictions for possible tips on Thursday and Sunday\n", "posterior_predictive_thursday = ppc_cg[\"y\"][:,idx==0].flatten()\n", "posterior_predictive_sunday = ppc_cg[\"y\"][:,idx==3].flatten()" ] @@ -1451,7 +1442,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Just to get a sense of the distributions let's plot the two distributions. From visual inspection it's hard to see the difference between the two groups." + "Just to get a sense of the distributions, let's plot them:" ] }, { @@ -1483,7 +1474,7 @@ } ], "source": [ - "az.plot_kde(posterior_predictive_thursday)" + "az.plot_kde(posterior_predictive_thursday);" ] }, { @@ -1515,13 +1506,17 @@ } ], "source": [ - "az.plot_kde(posterior_predictive_sunday)" + "az.plot_kde(posterior_predictive_sunday);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ + "From visual inspection, it's hard to see the difference between the two groups.\n", + "\n", + "### Step 3: Pick data points at random from each group and compare\n", + "\n", "Let's numerically simulate Cohen's D by drawing random samples from both posterior predictive distributions and naively calculating probability" ] }, @@ -1557,9 +1552,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - ".394 is very similar to the calculatio nwe obtained in Figure 2.18. This means that if given a tip on thursday there is only 39.4% chance that it will be higher than a tip on Sunday. In other words this means you should sign up for Sunday shifts if you're looking to make extra money! I encourage you to try changing the idx values to compare other groups.\n", + ".397 is very similar to the calculation we obtained in Figure 2.18. This means that, given a tip on Thursday, there is only a 4 in 10 chance that it will be higher than a tip on Sunday. In other words, you should sign up for Sunday shifts if you're looking to make extra money! I encourage you to try changing the idx values to compare other groups.\n", "\n", - "For reference another more efficient way to compute Cohen's D is possible with numpy broadcasting. The code below is functionally identical to the code above, but by using numpy broadcasting we can write less code and the computation is completed more efficiently." + "For reference, another, more efficient way to compute Cohen's D is possible with numpy broadcasting. The code below is functionally identical to the code above, but by using numpy broadcasting we can write less code and the computation is completed more efficiently:" ] }, { @@ -1589,8 +1584,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Question 7\n", - "***" + "## Question 7\n", + "***\n", + "\n", + "*Repeat the exercise we did with `model_h`. This time, without hierarchical structure, use a flat prior such as $Beta(1,1)$. Compare the results of both models.*" ] }, { @@ -1605,7 +1602,7 @@ "\n", "group_idx = np.repeat(np.arange(len(N_samples)), N_samples)\n", "data = []\n", - "for i in range(0, len(N_samples)):\n", + "for i in range(len(N_samples)):\n", " data.extend(np.repeat([1, 0], [G_samples[i], N_samples[i]-G_samples[i]]))" ] }, @@ -1631,8 +1628,10 @@ "# Baseline model\n", "with pm.Model() as model_h:\n", " μ = pm.Beta('μ', 1., 1.)\n", - " κ = pm.HalfNormal('κ', 10)\n", + " κ = pm.HalfNormal('κ', 10.)\n", + " \n", " θ = pm.Beta('θ', alpha=μ*κ, beta=(1.0-μ)*κ, shape=len(N_samples))\n", + " \n", " y = pm.Bernoulli('y', p=θ[group_idx], observed=data)\n", " \n", " trace_h = pm.sample(2000)\n", @@ -1678,7 +1677,7 @@ } ], "source": [ - "az.plot_trace(trace_h)" + "az.plot_trace(trace_h);" ] }, { @@ -1712,7 +1711,7 @@ } ], "source": [ - "az.plot_forest(trace_h)" + "az.plot_forest(trace_h);" ] }, { @@ -1855,7 +1854,7 @@ ], "source": [ "baseline_model = az.from_pymc3(trace=trace_h, posterior_predictive=ppc_h)\n", - "az.plot_ppc(baseline_model, figsize=(12,6), mean=False)" + "az.plot_ppc(baseline_model, figsize=(12,6), mean=False);" ] }, { @@ -1880,12 +1879,14 @@ "# Flat model\n", "with pm.Model() as model_h:\n", " μ = pm.Beta('μ', 1., 1.)\n", - " κ = pm.HalfNormal('κ', 10)\n", + " κ = pm.HalfNormal('κ', 10.)\n", + " \n", " θ = pm.Beta('θ', alpha=μ*κ, beta=(1.0-μ)*κ)\n", + " \n", " y = pm.Bernoulli('y', p=θ, observed=data)\n", " \n", " trace_non_h = pm.sample(2000)\n", - " ppc_non_h = pm.sample_posterior_predictive(trace_non_h, samples=500)\n" + " ppc_non_h = pm.sample_posterior_predictive(trace_non_h, samples=500)" ] }, { @@ -1923,7 +1924,7 @@ } ], "source": [ - "az.plot_trace(trace_non_h)" + "az.plot_trace(trace_non_h);" ] }, { @@ -1957,7 +1958,7 @@ } ], "source": [ - "az.plot_forest(trace_non_h)" + "az.plot_forest(trace_non_h);" ] }, { @@ -2078,14 +2079,14 @@ ], "source": [ "flat_model = az.from_pymc3(trace=trace_non_h, posterior_predictive=ppc_non_h)\n", - "az.plot_ppc(flat_model, figsize=(12,6))" + "az.plot_ppc(flat_model, figsize=(12,6));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Summary comparison" + "#### Summary comparison" ] }, { @@ -2234,16 +2235,19 @@ "metadata": {}, "source": [ "#### Discussion\n", - "Note that in the hiearchial model we get three estimates of $\\theta$, one per each group. By using a hiearchial model we're able to obtain the range of $\\theta$ parameters per group, and not just the average $\\theta$ parameter for all groups" + "In the hiearchical model, we get three estimates of $\\theta$, one per each group. So, by using a hierarchical model, we're able to obtain the range of $\\theta$ parameters per group, and not just the average $\\theta$ parameter for all groups." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "# Question 8\n", + "## Question 8\n", "***\n", - "Refer to Question 6 for non pooled version of model" + "\n", + "*Create a hierarchical version of the tips example by partially pooling across the days of the week. Compare the results to those obtained without the hierarchical structure.*\n", + "\n", + "Refer to Question 6 for the non-pooled version of the model." ] }, { @@ -2268,11 +2272,11 @@ "with pm.Model() as pooled_mu_tips:\n", " \n", " # Pooled Expected mean of tips\n", - " pooled_mean = pm.Normal('pooled_mean', mu=0, sd=10)\n", - " μ = pm.Normal(\"μ\", mu=pooled_mean, sd=1, shape=groups)\n", + " pooled_mean = pm.Normal('pooled_mean', mu=0., sd=10.)\n", + " μ = pm.Normal(\"μ\", mu=pooled_mean, sd=1., shape=groups)\n", " \n", " # Unpooled Standard Deviation of tips\n", - " σ = pm.HalfNormal('σ', sd=10, shape=groups)\n", + " σ = pm.HalfNormal('σ', sd=10., shape=groups)\n", "\n", " y = pm.Normal('y', mu=μ[idx], sd=σ[idx], observed=tip)\n", " \n", @@ -2327,7 +2331,7 @@ } ], "source": [ - "az.plot_trace(trace_pooled_tips)" + "az.plot_trace(trace_pooled_tips);" ] }, { @@ -2755,7 +2759,7 @@ } ], "source": [ - "az.plot_forest(pooled_tips_dataset)" + "az.plot_forest(pooled_tips_dataset);" ] }, { @@ -2789,14 +2793,14 @@ } ], "source": [ - "az.plot_forest(flat_tips)" + "az.plot_forest(flat_tips);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "# Question 9\n", + "## Question 9\n", "***" ] }, @@ -2804,7 +2808,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "To generate graphs you may need to install grapviz using the command `conda install -c conda-forge python-graphviz`" + "*PyMC3 can create **directed acyclic graphs (DAGs)** from models that are very similar to Kruschke's diagrams. You can obtain them using the `pm.model_to_graphviz()` function. To generate DAGs, you may need to install graphviz using the command `conda install -c conda-forge python-graphviz`. Generate a DAG for each model in this chapter.*" ] }, { @@ -2884,6 +2888,20 @@ "source": [ "pm.model_to_graphviz(pooled_mu_tips)" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "*A last note before finishing up: besides the exercises you'll find at the end of each chapter, you can always try to (and probably should) think of problems you are interested in and how to apply what you have learned to that problem. Maybe you will need to define your problem in a different way, or maybe you will need to expand or modify the models you have learned. If you think this task is beyond your current skills, note down the problem and come back to it after reading another chapter in this book. Eventually, if the book does not answer your questions, check the [PyMC3 examples](https://docs.pymc.io/nb_examples/index.html) or ask a question on [PyMC's Discourse](https://discourse.pymc.io/).*" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { @@ -2902,7 +2920,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.7" + "version": "3.7.3" } }, "nbformat": 4,