diff --git a/authors/index.xml b/authors/index.xml index fa4e287..ff472fd 100644 --- a/authors/index.xml +++ b/authors/index.xml @@ -19,7 +19,7 @@ https://lucianopaz.github.io/authors/lucianopaz/ <p>I&rsquo;m currently a machine learning engineer working at Innova SpA in Trieste, Italy.</p> -<p>In a past lift I worked researching computational neuroscience. During that time, I mostly built computational and mathematical models for behavior ranging from simple perceptual tasks in rodents to complex planning in humans.</p> +<p>In a past life I worked researching computational neuroscience. During that time, I mostly built computational and mathematical models for behavior ranging from simple perceptual tasks in rodents to complex planning in humans.</p> <p>I&rsquo;ve worked with diverse tools and frameworks such as bayesian inference, machine learning, artificial intelligence, and stochastic dynamics.</p> diff --git a/authors/lucianopaz/index.html b/authors/lucianopaz/index.html index 5d007a5..bffefca 100644 --- a/authors/lucianopaz/index.html +++ b/authors/lucianopaz/index.html @@ -504,7 +504,7 @@

I’m currently a machine learning engineer working at Innova SpA in Trieste, Italy.

-

In a past lift I worked researching computational neuroscience. During that time, I mostly built computational and mathematical models for behavior ranging from simple perceptual tasks in rodents to complex planning in humans.

+

In a past life I worked researching computational neuroscience. During that time, I mostly built computational and mathematical models for behavior ranging from simple perceptual tasks in rodents to complex planning in humans.

I’ve worked with diverse tools and frameworks such as bayesian inference, machine learning, artificial intelligence, and stochastic dynamics.

diff --git a/index.html b/index.html index 7d4061d..b3466a0 100644 --- a/index.html +++ b/index.html @@ -542,7 +542,7 @@

About me

I’m currently a machine learning engineer working at Innova SpA in Trieste, Italy.

-

In a past lift I worked researching computational neuroscience. During that time, I mostly built computational and mathematical models for behavior ranging from simple perceptual tasks in rodents to complex planning in humans.

+

In a past life I worked researching computational neuroscience. During that time, I mostly built computational and mathematical models for behavior ranging from simple perceptual tasks in rodents to complex planning in humans.

I’ve worked with diverse tools and frameworks such as bayesian inference, machine learning, artificial intelligence, and stochastic dynamics.

diff --git a/index.json b/index.json index c49111c..8f4b243 100644 --- a/index.json +++ b/index.json @@ -1 +1 @@ -[{"authors":["lucianopaz"],"categories":null,"content":"I\u0026rsquo;m currently a machine learning engineer working at Innova SpA in Trieste, Italy.\nIn a past lift I worked researching computational neuroscience. During that time, I mostly built computational and mathematical models for behavior ranging from simple perceptual tasks in rodents to complex planning in humans.\nI\u0026rsquo;ve worked with diverse tools and frameworks such as bayesian inference, machine learning, artificial intelligence, and stochastic dynamics.\nI\u0026rsquo;m also part of the development team of the pymc probabilistic programming packages written in python.\n","date":-62135596800,"expirydate":-62135596800,"kind":"taxonomy","lang":"en","lastmod":-62135596800,"objectID":"cd61adf50213a72955c4ccbeccb26c34","permalink":"https://lucianopaz.github.io/authors/lucianopaz/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/authors/lucianopaz/","section":"authors","summary":"I\u0026rsquo;m currently a machine learning engineer working at Innova SpA in Trieste, Italy.\nIn a past lift I worked researching computational neuroscience. During that time, I mostly built computational and mathematical models for behavior ranging from simple perceptual tasks in rodents to complex planning in humans.\nI\u0026rsquo;ve worked with diverse tools and frameworks such as bayesian inference, machine learning, artificial intelligence, and stochastic dynamics.\nI\u0026rsquo;m also part of the development team of the pymc probabilistic programming packages written in python.","tags":null,"title":"Luciano Paz","type":"authors"},{"authors":null,"categories":["ppl"],"content":" This post is intended to explain:\n What the shape attribute of a pymc3 RV is. What\u0026rsquo;s the difference between an RV\u0026rsquo;s and its associated distribution\u0026rsquo;s shape. How does a distribution\u0026rsquo;s shape determine the shape of its logp output. The potential trouble this can bring with samples drawn from the prior or from the posterior predictive distributions. The many implicit semantically relevant shapes associated with pymc3 models. We will follow the naming convention used by tensorflow probability and talk about sample, batch and event shapes. I plan to write a separate post with some best practices to avoid falling into the many circles of shape hell (or at least be able to maneuver inside them). The origin of all evil ambiguity Before going into the depths of PyMC3, we\u0026rsquo;ll take a detour around numpy\u0026rsquo;s random number generators to introduce the many semantically relevant shapes involved with random variables, samples drawn from probability distributions and how these affect the computed probability densities.\nnumpy has no concept of a random variable class, it provides some random number generators from which you can draw some samples that follow a particular probability distribution (I\u0026rsquo;m purposely not mentioning scipy.stats here). Lets draw some numbers using numpy.random.normal:\nimport numpy as np np.random.seed(12345) # Lets make a call using all the defaults x = np.random.normal() print(\u0026quot;Sample's value = {}\\nSample's shape = {}\u0026quot;.format(x, np.array(x).shape)) Sample's value = -0.20470765948471295 Sample's shape = () You can see that if we call the random number generator with empty arguments we get a single number, or more technically a scalar. We actually have to make a zero-sized array by wrapping the result in np.array(x) to be able to get a shape attribute out of it.\nSupplying the size argument we can control the number of draws we want to get. Well, we actually control the shape of the output array.\n# Lets make a call asking for a particular size x = np.random.normal(size=10) print(\u0026quot;Sample's value = {}\\nSample's shape = {}\u0026quot;.format(x, x.shape)) Sample's value = [ 0.47894334 -0.51943872 -0.5557303 1.96578057 1.39340583 0.09290788 0.28174615 0.76902257 1.24643474 1.00718936] Sample's shape = (10,) # Lets make a call asking for a tuple size x = np.random.normal(size=(10, 4, 2)) x.shape (10, 4, 2) Ok, so we control the number of draws we want to make using the size argument. The result of a call to the random number generator can return an ndarray or a scalar. But what happens if we change the distribution parameters, loc and scale?\n# Lets make a call supplying an array of locs x = np.random.normal(loc=np.arange(10)) x.shape (10,) We now also get an ndarray instead of a scalar! We get as many numbers as elements in the supplied loc distribution parameter. So what does size do now?\n# Lets make a call supplying an array of locs and asking for a given size x = np.random.normal(loc=np.arange(10), size=10) x.shape (10,) Wait, so what did size do here? The answer is essentially nothing!\nDistribution\u0026rsquo;s shape What is actually happening is that the distribution parameters, like loc and scale of a normal, implicitly define what we could call the distribution\u0026rsquo;s shape (when we get to multivariate distributions we will split this into batch_shape and event_shapes as in tensorflow probability, but for now lets stick to distribution_shape). The random number generator draws samples using the distribution\u0026rsquo;s implicit shape, but also vectorizes the random calls over the extra dimensions provided in size. If the distribution cannot broadcast with the supplied size, then an error is raised (it\u0026rsquo;s actually more pedantic than simple broadcasting).\n# size and loc broadcast so the random draws are vectorized over the extra dimensions of size x = np.random.normal(loc=np.arange(10), size=(300, 10)) x.shape (300, 10) # Each column has a different loc np.mean(x, axis=0) array([0.01963069, 0.95968487, 2.02679069, 2.99794185, 4.00357886, 4.98255339, 6.08488072, 6.895649 , 8.07455776, 9.02202938]) # size does not broadcast with loc, and an error is raised x = np.random.normal(loc=np.arange(10), size=3) x.shape --------------------------------------------------------------------------- ValueError Traceback (most recent call last) \u0026lt;ipython-input-9-6ec74377ffcd\u0026gt; in \u0026lt;module\u0026gt;() 1 # size does not broadcast with loc, and an error is raised ----\u0026gt; 2 x = np.random.normal(loc=np.arange(10), size=3) 3 x.shape mtrand.pyx in mtrand.RandomState.normal() mtrand.pyx in mtrand.cont2_array() ValueError: shape mismatch: objects cannot be broadcast to a single shape # size and loc broadcast so the random draws are vectorized over the extra dimensions of size x = np.random.normal(loc=np.arange(10)[:, np.newaxis], size=(10, 300)) x.shape (10, 300) # Each row has a different loc np.mean(x, axis=1) array([0.02937006, 0.98651717, 1.9862869 , 3.00312987, 4.017759 , 4.92210904, 6.01890166, 7.0359308 , 7.98685293, 8.88494283]) # size and loc broadcast but the non singleton dimensions in loc do not match # with the correspoding size element so an error is raised x = np.random.normal(loc=np.arange(10)[:, np.newaxis], size=(1, 300)) x.shape --------------------------------------------------------------------------- ValueError Traceback (most recent call last) \u0026lt;ipython-input-12-1b49ca5414a0\u0026gt; in \u0026lt;module\u0026gt;() 1 # size and loc broadcast but the non singleton dimensions in loc do not match 2 # with the correspoding size element so an error is raised ----\u0026gt; 3 x = np.random.normal(loc=np.arange(10)[:, np.newaxis], size=(1, 300)) 4 x.shape mtrand.pyx in mtrand.RandomState.normal() mtrand.pyx in mtrand.cont2_array() ValueError: size is not compatible with inputs # The distribution's implictly defined shape depends on how the parameters # loc and scale broadcast with each other x = np.random.normal( loc=np.arange(3).reshape((3, 1)), scale=np.arange(4) + 1, size=(3000, 3, 4) ) x.shape (3000, 3, 4) # The second axis has different means and the last axis has different standard deviations np.mean(x, axis=0) array([[ 0.0102261 , -0.00975594, -0.0082977 , 0.11634068], [ 0.9840894 , 0.94276502, 0.93028859, 0.968283 ], [ 2.00062043, 1.99421095, 2.11480227, 1.83806223]]) np.std(x, axis=0) array([[0.98598852, 2.02771629, 2.98777904, 3.94358108], [1.01194307, 2.00820371, 3.0372083 , 3.94933656], [1.00660898, 2.02210172, 3.06703822, 3.98733117]]) Pointers that we need to understand so far: Distributions have parameters. These parameters can implicitly determine the distribution\u0026rsquo;s shape. Samples drawn from a distribution are ndarrays that have some shape which is controlled with the size of the draw and the distribution\u0026rsquo;s implicit shape. Any operation that is performed on a sample drawn from a probability distribution does not actually interact with the distribution\u0026rsquo;s shape, but with something that is determined by size along with the distribution\u0026rsquo;s shape. The word shape is shamelessly used all over the place! This is maybe the prime source of confusion in all things shape related. Multivariate distributions Up until now we only dealt with the normal distribution\u0026rsquo;s random number generator. This is a scalar distribution. What this means is that a single sample drawn from a scalar distribution is also a scalar. On the other hand, multivariate distributions, such as the Multinomial, Multivariate Normal and Wishart, amongst others, yield vectors or higher ranked tensors as their single samples. To make this more precise, we will to refer to a single sample yielded by a single distribution as an event, and its shape will be called the event_shape (we call them events because we follow tensorflow probability\u0026rsquo;s naming convention for shapes).\n Scalar distributions have an event_shape == (). Multivariate distributions have non empty event_shapes. But what happened when we passed a loc vector to the normal distribution? Did we somehow make a multivariate distribution implicitly? The answer is no, we did not make a multivariate distribution, we made a batch of many scalar distribution. If you\u0026rsquo;re confused, don\u0026rsquo;t worry. The difference between a batch and a multivariate distribution is not clear by just looking at the shapes of samples drawn from them. For example, if we have a batch of two scalar normal distributions, and compare it to a bivariate normal distribution, both have the same distribution shape, (2,). The difference will only become clear when we look at how the distribution\u0026rsquo;s logp is computed. To do this, we\u0026rsquo;ll rely on scipy.stats which provides a uniform API for many probability distributions and there logpdf methods.\nfrom scipy import stats loc = np.arange(2) scale = np.ones(2) observed = np.arange(2) batch_logp = stats.norm.logpdf(observed, loc, scale) event_logp = stats.multivariate_normal.logpdf(observed, loc, scale) print(\u0026quot;Batch of normals logp shape = {}\u0026quot;.format(batch_logp.shape)) print(\u0026quot;Multivariate normal logp shape = {}\u0026quot;.format(event_logp.shape)) print(\u0026quot;Batch of normals total logp = {}\u0026quot;.format(np.sum(batch_logp))) print(\u0026quot;Multivariate normal total logp = {}\u0026quot;.format(np.sum(event_logp))) Batch of normals logp shape = (2,) Multivariate normal logp shape = () Batch of normals total logp = -1.8378770664093453 Multivariate normal total logp = -1.8378770664093453 The distribution\u0026rsquo;s logp measures the probability density (or mass for discrete distributions) that a sample was drawn from it. For the batch of normals, each element in the batch shape corresponds to an independent observation that should be measured according to an independent distribution in the batch. For this reason, we get (2,) numbers returned by the batch of gaussian distributions. On the other hand, the bivariate gaussian has event_shape == (2,). The entire axis of the event_shape is consumed with the logp calculation.\nGenerally, it is possible to write a batch of multivariate distributions. The resulting distribution shape is the combination of the batch and event shapes.\nAs a final summary on the shape semantics: The event_shape is the shape of a single draw from the unbatched distribution. For scalar distributions, this is an empty tuple (). The batch_shape describes the shape of a draw from a batch of many distributions (like what we did with the normal draws when we supplied many loc and scales). The sample_shape describes the shape of the independent, identically distributed (IID) draws made from a distribution or batch of distributions. A distribution\u0026rsquo;s shape is made up by a batch shape and an event shape: distribution_shape == batch_shape + event_shape. When you draw random samples from a distribution, the resulting array has the following shape: sample_shape + batch_shape + event_shape When you compute the logp of an observed sample, the event part of the shape is consumed and the output\u0026rsquo;s shape will be sample_shape + batch_shape. The three shape related concepts are a bit complicated to grasp so I recommend that you read Eric Ma\u0026rsquo;s very nice explanation on them or just look at TFP\u0026rsquo;s colab on them.\nShapes in PyMC3 PyMC3 is intended to be a user friendly modeling package in python. It aims to make probabilistic programming easy by providing an intuitive syntax to describe data generative processes and out-of-the-box Markov Chain Monte Carlo (MCMC) methods to make inferences from observed data. It is designed with inference in mind. It uses theano as its computational backend to be able to express mathematical operations on random variables symbolically. pymc3\u0026rsquo;s design has lead to a lot of ambiguity on the what shape actually means, and how it affects sampling under different circumstances. The main reasons for the shape ambiguity in pymc3 are:\n The three shape concepts (sample, batch and event shapes) are never explicitly disambiguated in the Distribution nor in the RV classes. There is no distinction between batch and event shapes, users can only set the distribution\u0026rsquo;s shape, which is an implicit combination of both. We deal with theano.tensor that represent mathematical operations over random variates. These operations hide most of the semantical batch and event shapes under a single distribution shape in the best case, and as a symbolic shape that is unknown until run time in the worst case. The mathematical operations may work well for the model definition when one only needs to pm.sample, but might run into shape problems when running pm.sample_prior_predictive and pm.sample_posterior_predictive. Model logp reduces the random variable\u0026rsquo;s logp outputs using theano.tensor.sum so the sample-batch shape information is lost for sampling. The shape of ObservedRVs is partly set by the observed argument, but this is not equivalent to the distribution\u0026rsquo;s shape, leading to an ambiguous and confusing situation for users when they define their models, and potential errors when using sample_prior_predictive. The distribution\u0026rsquo;s batch and event shape is not inferred from the supplied parameters, one must always specify the distribution\u0026rsquo;s shape during model construction. Random variable\u0026rsquo;s shapes and problems with sampling To highlight some of these complications, we have to look at data generative models. Random variables are used to define data generation processes like the following:\n$$ \\beta_{G} \\sim \\mathcal{N}(0, 1)\\\\\n\\gamma \\sim \\mathcal{\\Gamma}(10, 10)\\\\\nz_{i} \\sim \\mathcal{N}(0, 1)\\\\\n\\beta_{i} = \\beta_{G} + \\gamma z_{i}\\\\\n\\sigma \\sim \\mathcal{\\Gamma}(3, 3)\\\\\ny \\sim \\mathcal{N}\\left(\\vec{X}. \\vec{\\beta}, \\sigma\\right) $$\nThis is a simple hierarchical model usually used for linear regressions with partial pooling. The key pieces of information for end users are: $y$ and $\\vec{X}$. These are the actually observed values. $y$ represents an observed output of the assumed data generation process, and $\\vec{X}$ would be the corresponding input. We assume $\\vec{X}$ is a vector because it can have more than one feature that is used to generate the output observation. In general, we have more than a single observation, in thoses cases $\\vec{y}$ is a vector (or 1D array) and $\\mathbb{X}$ is a matrix (or 2D array, made from vertically stacking many $\\vec{X}$).\nThe rest of the parameters define the unobserved random variables that we will try to infer given the set of observations. $\\sigma$ is the observation noise. $\\vec{\\beta}$ are the regression weights, and they are a summation of two other sets of random variables: $\\beta_{G}$, $\\gamma$ and $\\vec{z}$.\nWhen we do MCMC, we propose candidate values for the random variables based on the model\u0026rsquo;s logp. We use theano and our RVs are essentially placeholders or Variables whose values are unknown when the model is defined, but are updated during sampling. The only thing that is fixed during the model\u0026rsquo;s construction is what we will call the distribution\u0026rsquo;s core dimensions, ndim, which is a fancy way of calling the axes defined by batch_shape and event_shape. MCMC sampling in pymc3 operates on single samples from distributions and only the logp shape handling of ObservedRVs can bring shape related trouble (like what happened in this issue with Categorical.logp).\nOn the other hand, when we do forward sampling (i.e. sample_prior_predictive and sample_posterior_predictive) the same data generation model that was defined only taking into account the core dimensions, has to be used with the extra dimensions given by sample_shape. This leads to a whole world of pain which we will illustrate now using only numpy to write the data generation model:\n# This implementation uses the argument size incorrectly def unvectorized_prior_sampler(X=np.arange(3), size=None): normal = np.random.normal beta_g = normal(0, 1, size=size) gamma = np.random.gamma(10, 10, size=size) z = normal(np.zeros(X.shape[-1]), 1, size=size) beta = beta_g + gamma * z sigma = np.random.gamma(3, 3, size=size) return normal(np.dot(X, beta), sigma, size=size) So lets try out our nice little unvectorized_prior_sampler using only the default size=None. This means that\n We will be drawing a single sample from each distribution Said sample will have the distribution\u0026rsquo;s implicit shape All mathematical operations will be performed on the resulting floats or ndarrays that are returned as the distributions samples.\nfrom matplotlib import pyplot as plt import seaborn as sns plt.ion() sns.distplot([unvectorized_prior_sampler() for _ in range(5000)]); Nice! Everything works! Now lets try to do the same but taking advantage of size to draw multiple values at once and vectorize operations\nunvectorized_prior_sampler(size=5000) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) \u0026lt;ipython-input-20-30082f0e1390\u0026gt; in \u0026lt;module\u0026gt;() ----\u0026gt; 1 unvectorized_prior_sampler(size=5000) \u0026lt;ipython-input-17-2bc805c3fbae\u0026gt; in unvectorized_prior_sampler(X, size) 4 beta_g = normal(0, 1, size=size) 5 gamma = np.random.gamma(10, 10, size=size) ----\u0026gt; 6 z = normal(np.zeros(X.shape[-1]), 1, size=size) 7 beta = beta_g + gamma * z 8 sigma = np.random.gamma(3, 3, size=size) mtrand.pyx in mtrand.RandomState.normal() mtrand.pyx in mtrand.cont2_array() ValueError: shape mismatch: objects cannot be broadcast to a single shape Oh no! We can\u0026rsquo;t use the same size argument for drawing from every distribution, because it doesn\u0026rsquo;t broadcast well with the implicit distribution\u0026rsquo;s shape.\nWhat we want to do is set the sample_shape instead, which is agnostic of each distribution\u0026rsquo;s shape, so will try to work around the distribution_shape.\n# This implementation uses the argument sample_shape incorrectly def incorrect_vectorized_prior_sampler(X=np.arange(3), sample_shape=None): normal = np.random.normal if sample_shape is None: sample_shape = () try: sample_shape = tuple(sample_shape) except TypeError: sample_shape = (sample_shape,) beta_g = normal(0, 1, size=sample_shape) gamma = np.random.gamma(10, 10, size=sample_shape) # We add the distribution's shape at the end of sample_shape z = normal(np.zeros(X.shape[-1]), 1, size=sample_shape + (X.shape[-1],)) beta = beta_g + gamma * z sigma = np.random.gamma(3, 3, size=sample_shape) return normal(np.dot(X, beta), sigma, size=sample_shape) incorrect_vectorized_prior_sampler(sample_shape=5000) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) \u0026lt;ipython-input-22-235899f566bb\u0026gt; in \u0026lt;module\u0026gt;() ----\u0026gt; 1 incorrect_vectorized_prior_sampler(sample_shape=5000) \u0026lt;ipython-input-21-f4692868adc5\u0026gt; in incorrect_vectorized_prior_sampler(X, sample_shape) 12 # We add the distribution's shape at the end of sample_shape 13 z = normal(np.zeros(X.shape[-1]), 1, size=sample_shape + (X.shape[-1],)) ---\u0026gt; 14 beta = beta_g + gamma * z 15 sigma = np.random.gamma(3, 3, size=sample_shape) 16 return normal(np.dot(X, beta), sigma, size=sample_shape) ValueError: operands could not be broadcast together with shapes (5000,) (5000,3) Still not working! Lets zoom-in to what is causing the error. We see that we are summing beta_g\u0026rsquo;s samples with the product of gamma and z\u0026rsquo;s samples. Let\u0026rsquo;s look at the shapes of the samples:\n beta_g.shape == sample_shape gamma.shape == sample_shape z.shape == sample_shape + (X.shape[-1],) So in our particular examples these are shaped as:\n beta_g.shape == (5000,) gamma.shape == (5000,) z.shape == (5000, 3) So it makes sense that the product of gamma with z does not broadcast. You could ask yourself, why did this work at all when we call this function with sample_shape=None? The answer is that a single sample from each distribution (calling with sample_shape=()), has the distribution\u0026rsquo;s implicit shape, and those DO BROADCAST CORRECTLY:\n beta_g.shape == () gamma.shape == () z.shape == (3,) To make our model\u0026rsquo;s data generation process fully vectorized we need two things:\n To ensure that all the mathematical operations (sums, products, dots) and all the conditional dependences between variables (like $y$ and $\\vec{\\beta}$) work on the distribution\u0026rsquo;s core dimensions (batch_shape + event_shape). We choose this name to stick with numpy\u0026rsquo;s glossary of generalized universal functions. That the mathematical operations be vectorized on the extra dimensions added by the requested sample_shape (something like what is done with numpy\u0026rsquo;s generic ufuncs).\n# Overly expressive but _almost_ fully vectorized implementation def vectorized_prior_sampler(X=np.arange(3), sample_shape=None): normal = np.random.normal if sample_shape is None: sample_shape = () try: sample_shape = tuple(sample_shape) except TypeError: sample_shape = (sample_shape,) beta_g = normal(0, 1, size=sample_shape) gamma = np.random.gamma(10, 10, size=sample_shape) # We add the distribution's shape at the end of sample_shape z = normal(np.zeros(X.shape[-1]), 1, size=sample_shape + (X.shape[-1],)) beta = beta_g[..., None] + gamma[..., None] * z sigma = np.random.gamma(3, 3, size=sample_shape) return normal(np.einsum(\u0026quot;...i,...i\u0026quot;, X, beta), sigma, size=sample_shape) sns.distplot(vectorized_prior_sampler(sample_shape=5000)); Things can still go horribly wrong if we pass a stack of multiple X vectors. This is a big deal because it happens when we set more than a single observed X and y.\nX = np.tile(np.arange(3).reshape((1, 3)), (10, 1)) vectorized_prior_sampler(X=X, sample_shape=5000) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) \u0026lt;ipython-input-25-f754aa0fb06e\u0026gt; in \u0026lt;module\u0026gt;() 1 X = np.tile(np.arange(3).reshape((1, 3)), (10, 1)) ----\u0026gt; 2 vectorized_prior_sampler(X=X, sample_shape=5000) \u0026lt;ipython-input-23-5506fae2796b\u0026gt; in vectorized_prior_sampler(X, sample_shape) 14 beta = beta_g[..., None] + gamma[..., None] * z 15 sigma = np.random.gamma(3, 3, size=sample_shape) ---\u0026gt; 16 return normal(np.einsum(\u0026quot;...i,...i\u0026quot;, X, beta), sigma, size=sample_shape) ~/anaconda3/lib/python3.6/site-packages/numpy/core/einsumfunc.py in einsum(*operands, **kwargs) 1067 # If no optimization, run pure einsum 1068 if optimize_arg is False: -\u0026gt; 1069 return c_einsum(*operands, **kwargs) 1070 1071 valid_einsum_kwargs = ['out', 'dtype', 'order', 'casting'] ValueError: operands could not be broadcast together with remapped shapes [original-\u0026gt;remapped]: (10,3)-\u0026gt;(10,3) (5000,3)-\u0026gt;(5000,3) To solve this, we need to make our einsum aware of the extra observation dimensions in X, and then ensure that the result broadcasts with samples taken from sigma.\n# fully vectorized implementation def prior_sampler(X=np.arange(3), sample_shape=None): normal = np.random.normal if sample_shape is None: sample_shape = () try: sample_shape = tuple(sample_shape) except TypeError: sample_shape = (sample_shape,) beta_g = normal(0, 1, size=sample_shape) gamma = np.random.gamma(10, 10, size=sample_shape) # We add the distribution's shape at the end of sample_shape z = normal(np.zeros(X.shape[-1]), 1, size=sample_shape + (X.shape[-1],)) beta = beta_g[..., None] + gamma[..., None] * z sigma = np.random.gamma(3, 3, size=sample_shape) if X.ndim == 1: signature = \u0026quot;...i,...i\u0026quot; else: signature = \u0026quot;...{0}i,...i-\u0026gt;...{0}\u0026quot;.format(\u0026quot;\u0026quot;.join([chr(106 + dim) for dim in range(X.ndim - 1)])) sigma = np.reshape(sigma, sigma.shape + (1,) * (X.ndim - 1)) return normal(np.einsum(signature, X, beta), sigma, size=sample_shape + (X.shape[:-1])) X = np.tile(np.arange(3).reshape((1, 3)), (10, 1)) prior = prior_sampler(X=X, sample_shape=5000) colors = [plt.get_cmap(\u0026quot;rainbow\u0026quot;)(x) for x in np.linspace(0, 1, len(X))] plt_x = np.linspace( np.floor(np.min(prior) / 1) * 1, np.ceil(np.max(prior) / 1) * 1, 1000 ) for offset, (p, color) in enumerate(zip(prior.T, colors)): plt_y = stats.gaussian_kde(p)(plt_x) plt.plot(plt_x - offset * 100, plt_y + offset * 0.001, color=color) plt.axis(\u0026quot;off\u0026quot;); Summary All of this data generation process using raw numpy was intended to showcase that:\n When you write down a model and only think about the distribution\u0026rsquo;s core dimensions, this will in general make it difficult to generate data (sample the prior or posterior predictive) in a vectorized fashion. To ensure proper vectorization, the mathematical operations that act on RVs must work on the core dimensions and be vectorized on the extra axes. The RVs that are themselves parameters of other distributions downstream will have an extra sample_shape axes when generating data. This means that the distribution parameters must be broadcastable excluding the sample_shape parts. pymc3 does most of this for you in sample_prior_predictive and sample_posterior_predictive. When we write down a model in pymc3, we can easily identify the distribution core dimensions. This allows us to vectorize the mathematical operations using numpy.vectorize with an adequate signature. Furthermore, the distribution parameters are broadcasted ignoring the sample_shape part thanks to the utilities defined in the pymc3.distributions.shape_utils module. However, at the moment this broadcasting works for scalar distributions only, the multivariate distributions are still not fully vectorized.\nPyMC3 shape model construction Before looking at the equivalent data generation model from above but written in pymc3, I\u0026rsquo;ll briefly remind you of three major things that one should always do with shapes in pymc3 models:\n When we create random variables inside a model, their shape cannot be implicitly inferred from the distribution parameters. This means that if we want to have multidimensional or batches of distributions we need to specify the desired shape during the distribution\u0026rsquo;s instantiation. When we create a random variable and specify the shape, we are not setting the random variable\u0026rsquo;s shape, we are setting the distribution\u0026rsquo;s shape and the distribution\u0026rsquo;s core dimensions. When a random variable is given observations, the ObservedRV\u0026rsquo;s shape (not the underlying distribution\u0026rsquo;s shape) is determined by the observed argument. At the moment, there are no checks in place between the ObservedRV and the underlying distribution\u0026rsquo;s shapes, but in the near future, an error will be raised if the ObservedRV\u0026rsquo;s shape is not consistent with a sample, with a given sample_shape, drawn from the underlying distribution. Handling observations is really important in pymc3, so I\u0026rsquo;ll stop here a bit because there are a lot of ambiguous things revolving around shapes. Essentially ObservedRVs have an observed attribute that represent the data that they have observed. This data should have a shape that looks like sample_shape + distribution_shape. The distribution\u0026rsquo;s shape should always be provided, or it will be assumed to represent a scalar RV (distribution_shape=() and hence zero core dimensions). When we draw samples from the posterior predictive distribution, we let the ObservedRVs shape be treated as the distribution\u0026rsquo;s shape. When we draw samples from the prior predictive distribution, we want to ignore the ObservedRV\u0026rsquo;s shape and just work with the underlying distribution\u0026rsquo;s shape (this is currently not working due to an old open issue).\nNow lets define our model in pymc3\nimport numpy as np import theano from theano import tensor as tt import pymc3 as pm # Model definition is encapsulated to easily use with or without observations def model_factory(X=np.arange(3), y=None): with pm.Model() as model: beta_g = pm.Normal(\u0026quot;beta_g\u0026quot;, mu=0, sigma=1) gamma = pm.Gamma(\u0026quot;gamma\u0026quot;, alpha=10, beta=10) z = pm.Normal(\u0026quot;z\u0026quot;, mu=0, sigma=1, shape=X.shape[1]) sigma = pm.Gamma(\u0026quot;sigma\u0026quot;, alpha=3, beta=3) beta = beta_g + gamma * z y = pm.Normal( \u0026quot;y\u0026quot;, mu=tt.dot(X, beta), sigma=sigma, shape=X.shape[:-1], observed=y ) return model # Data generation X = np.random.randn(400, 3) X[:, 0] = 1. with model_factory(X): prior = pm.sample_prior_predictive(samples=1) y = prior.pop(\u0026quot;y\u0026quot;) true_parameters = prior print(true_parameters) # Inference with model_factory(X, y) as model: trace = pm.sample() pm.plot_posterior(trace) /home/lpaz/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters {'sigma': array(0.86392362), 'z': array([ 0.58161754, -0.92677227, 0.60365752]), 'gamma': array(0.90931996), 'sigma_log__': -0.1462709159678767, 'beta_g': array(-1.14124179), 'gamma_log__': -0.09505825242743682} Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (2 chains in 2 jobs) NUTS: [sigma, z, gamma, beta_g] Sampling 2 chains, 0 divergences: 100%|██████████| 2000/2000 [00:16\u0026lt;00:00, 124.98draws/s] The number of effective samples is smaller than 25% for some parameters. The resulting trace seems mostly fine, the posterior densities are around where they should be, and the model seems to have converged well enough (I don\u0026rsquo;t show the checks here to avoid clobbering the post). Lets just pay attention to the shape parameters that I passed along in the model\u0026rsquo;s definition.\n I wrote a factory function to be able to easily create a model I\u0026rsquo;ll use without observations to use sample_prior_predictive to generate a set of observations. Then, I create a second model passing the observed data to do inference. All the distributions that are not scalar have a specified shape, including the observed y. This is because, the distribution shape cannot be inferred properly from the distribution parameters. We can view some of the differences between RV and distribution shapes with this model:\ndef rv_shape_info(model, rv): try: shape = \u0026quot;is a symbolic theano expression that evaluates to {}\u0026quot;.format(rv.shape.eval()) except: shape = \u0026quot;is a symbolic theano expression, unknown until runtime\u0026quot; return ( \u0026quot;RV name = {}.\\n\u0026quot; \u0026quot;Type of RV = {}.\\n\u0026quot; \u0026quot;RV shape {}.\\n\u0026quot; \u0026quot;Distribution shape = {}\\n\u0026quot; \u0026quot;logp shape = {}.\\n\u0026quot;. format( rv.name, type(rv), shape, rv.distribution.shape, rv.logp_elemwise(model.test_point).shape ) ) beta_g = model.beta_g print(rv_shape_info(model, beta_g)) RV name = beta_g. Type of RV = \u0026lt;class 'pymc3.model.FreeRV'\u0026gt;. RV shape is a symbolic theano expression, unknown until runtime. Distribution shape = [] logp shape = (). z = model.z print(rv_shape_info(model, z)) RV name = z. Type of RV = \u0026lt;class 'pymc3.model.FreeRV'\u0026gt;. RV shape is a symbolic theano expression, unknown until runtime. Distribution shape = [3] logp shape = (3,). y = model.y print(rv_shape_info(model, y)) RV name = y. Type of RV = \u0026lt;class 'pymc3.model.ObservedRV'\u0026gt;. RV shape is a symbolic theano expression that evaluates to [400]. Distribution shape = [400] logp shape = (400,). Shape handling in logp You may have noticed that I printed something I called logp shape. Lets look at what that is. Each Distribution has a logp method with the following signature:\n... def logp(self, value): ... It takes an input value and returns a theano expression that can later be compiled to compute the logarithm of the probability density function of the distribution, conditioned on the value input. When an RV is instantiated, the distribution\u0026rsquo;s logp is called with one of two possible inputs:\n If no observed is provided, logp is called passing the RV as the value argument If observed is provided, logp is called passing observed as the value argument Keep in mind that this simply returns a theano.tensor, called logp_elemwiset, so it still has no value, and hence no shape to check. pymc3 later provides the RVs with two convenience methods to compute logp numerical values by compiling the returned theano.tensor into a function. The first is the widely known rv.logp method, which computes the sum of the logp_elemwiset on a given point input. The second, less known method, is rv.logp_elemwise, which simply returns the numerical output of logp_elemwiset on a given point input. This means that we can use rv.logp_elemwise to check how a given distribution handles the logp\u0026rsquo;s output shape\nIf you recall the discussion on event_shape and batch_shape, we can compare the logps of a batch of normals versus a multivariate normal on a given set of observations\ndef model_factory(batch_shape, event_shape, batch_obs=None, event_obs=None): with pm.Model() as model: batch = pm.Normal( \u0026quot;batch\u0026quot;, mu=0, sigma=1, shape=batch_shape, observed=batch_obs, ) event = pm.MvNormal( \u0026quot;event\u0026quot;, mu=np.zeros(event_shape), cov=np.diag(np.ones(event_shape)), shape=event_shape, observed=event_obs, ) return model batch_shape = (3,) event_shape = (3,) sample_shape = (4,) # We generate data to later be used as observations test_model = model_factory(batch_shape, event_shape) b = test_model.batch.random(size=sample_shape) e = test_model.event.random(size=sample_shape) # We create the model passing observations to the RVs so that we can compute logp test_model = model_factory(batch_shape, event_shape, batch_obs=b, event_obs=b) point = {\u0026quot;batch\u0026quot;: b, \u0026quot;event\u0026quot;: e} blp = test_model.batch.logp_elemwise(point) elp = test_model.event.logp_elemwise(point) print(\u0026quot;Shape of draw from a batch of normals = {}\u0026quot;.format(b.shape)) print(\u0026quot;Shape of draw from an MvNormal = {}\u0026quot;.format(e.shape)) print(\u0026quot;Shape of logp output from a batch of normals = {}\u0026quot;.format(blp.shape)) print(\u0026quot;Shape of logp output from an MvNormal = {}\u0026quot;.format(elp.shape)) print(\u0026quot;Total logp output from a batch of normals = {}\u0026quot;.format(np.sum(blp))) print(\u0026quot;Total logp output from an MvNormal = {}\u0026quot;.format(np.sum(elp))) Shape of draw from a batch of normals = (4, 3) Shape of draw from an MvNormal = (4, 3) Shape of logp output from a batch of normals = (4, 3) Shape of logp output from an MvNormal = (4,) Total logp output from a batch of normals = -22.535408832791916 Total logp output from an MvNormal = -22.53540883279192 The previous example highlights the following:\n event_shape and batch_shape are implicit to a distribution, and only accessible through a single shape argument. The logp computations distinguish between the event_shape and batch_shape of distributions, but drawing samples from distributions does not. Posterior predictive sampling We can now draw samples from the posterior predictive distribution and check the output\u0026rsquo;s shape. What we do at the moment is:\n We iterate through the points in the trace one at a time. This means that the RV values provided will only have their core dimensions. This makes forward sampling safer but slower. We use the ObservedRV\u0026rsquo;s shape as the distribution_shape, so all samples should have shape=(number_of_points_in_trace,) + observedRV.shape.eval().\nprint(\u0026quot;Total number of points in trace = {}\u0026quot;.format(len([p for p in trace.points()]))) Total number of points in trace = 1000\nwith model: ppc = pm.sample_posterior_predictive(trace) ppc[\u0026quot;y\u0026quot;].shape 100%|██████████| 1000\u0026frasl;1000 [00:01\u0026lt;00:00, 829.93it/s]\n(1000, 400)\n To better highlight the shape difference in sampling from an ObservedRVs prior distribution and its posterior predictive we\u0026rsquo;ll use the following model:\nwith pm.Model() as model: mu = pm.Normal(\u0026quot;mu\u0026quot;, 0, 1) obs = pm.Normal(\u0026quot;obs\u0026quot;, mu, 1, shape=(3,), observed=np.random.randn(10, 3) + 1) prior = obs.distribution.random(size=100) trace = pm.sample() ppc = pm.sample_posterior_predictive(trace, samples=100) print(\u0026quot;Shape of samples drawn from the prior distribution = {}\u0026quot;.format(prior.shape)) print(\u0026quot;Shape of samples drawn from the posterior predictive = {}\u0026quot;.format(ppc[\u0026quot;obs\u0026quot;].shape)) Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (2 chains in 2 jobs) NUTS: [mu] Sampling 2 chains, 0 divergences: 100%|██████████| 2000/2000 [00:00\u0026lt;00:00, 2665.14draws/s] /home/lpaz/repos/pymc3/pymc3/sampling.py:1109: UserWarning: samples parameter is smaller than nchains times ndraws, some draws and/or chains may not be represented in the returned posterior predictive sample warnings.warn(\u0026quot;samples parameter is smaller than nchains times ndraws, some draws \u0026quot; 100%|██████████| 100/100 [00:00\u0026lt;00:00, 2240.26it/s] Shape of samples drawn from the prior distribution = (100, 3) Shape of samples drawn from the posterior predictive = (100, 10, 3) ","date":1566172800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1566172800,"objectID":"b33bf9180207918ee517121c4dc3a54f","permalink":"https://lucianopaz.github.io/2019/08/19/pymc3-shape-handling/","publishdate":"2019-08-19T00:00:00Z","relpermalink":"/2019/08/19/pymc3-shape-handling/","section":"post","summary":"This post is intended to explain:\n What the shape attribute of a pymc3 RV is. What\u0026rsquo;s the difference between an RV\u0026rsquo;s and its associated distribution\u0026rsquo;s shape. How does a distribution\u0026rsquo;s shape determine the shape of its logp output. The potential trouble this can bring with samples drawn from the prior or from the posterior predictive distributions. The many implicit semantically relevant shapes associated with pymc3 models. We will follow the naming convention used by tensorflow probability and talk about sample, batch and event shapes.","tags":["pymc","python","ppl"],"title":"PyMC3 shape handling","type":"post"},{"authors":null,"categories":["Other"],"content":"Hi everyone, and welcome to the first post of my website! I wanted a place to write about the things I\u0026rsquo;ve worked on, the topics that I find interesting and also some thoughts, notes and code, that I hope that you may find helpful, or at least interesting.\n","date":1564876800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1564876800,"objectID":"bdc92b56d808f6c7e15d98d1710a00e2","permalink":"https://lucianopaz.github.io/2019/08/04/hello-world/","publishdate":"2019-08-04T00:00:00Z","relpermalink":"/2019/08/04/hello-world/","section":"post","summary":"Hi everyone, and welcome to the first post of my website! I wanted a place to write about the things I\u0026rsquo;ve worked on, the topics that I find interesting and also some thoughts, notes and code, that I hope that you may find helpful, or at least interesting.","tags":["other"],"title":"Hello world!","type":"post"},{"authors":null,"categories":["neuroscience"],"content":"I used AI computational models to study human gameplay as part of the Matemarote project. It involved reinforcement learning, and technical skills such as SQL and javascript.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":-62135596800,"objectID":"bc4941250f32ab68b85a4fc819051131","permalink":"https://lucianopaz.github.io/project/behavior_modelling_with_ai/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/project/behavior_modelling_with_ai/","section":"project","summary":"I used AI computational models to study human gameplay as part of the Matemarote project. It involved reinforcement learning, and technical skills such as SQL and javascript.","tags":["artificial intelligence","behavior","javascript","databases","neuroscience"],"title":"Behavior modelling with AI","type":"project"},{"authors":null,"categories":["machine learning"],"content":"A machine learning tool to analyze and visualize geolocation data. It was part of the PHD4PMI initiative. Me and Natalia Grion, worked on this project for Innova SPA.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":-62135596800,"objectID":"d44a8ffc07fd400f68a4d445dc7416da","permalink":"https://lucianopaz.github.io/project/phd4pmi/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/project/phd4pmi/","section":"project","summary":"A machine learning tool to analyze and visualize geolocation data. It was part of the PHD4PMI initiative. Me and Natalia Grion, worked on this project for Innova SPA.","tags":["machine learning","geodata"],"title":"PhD4PMI","type":"project"},{"authors":null,"categories":["other"],"content":"I developed a joomla component for the front and backend of the research section of the physics department of the University of Buenos Aires\u0026rsquo; webpage. It used a combination of php, jQuery, bootstrap, and MySQL.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":-62135596800,"objectID":"092d56ddcc72a3eb4454baf9a1861e6c","permalink":"https://lucianopaz.github.io/project/web_page_df/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/project/web_page_df/","section":"project","summary":"I developed a joomla component for the front and backend of the research section of the physics department of the University of Buenos Aires\u0026rsquo; webpage. It used a combination of php, jQuery, bootstrap, and MySQL.","tags":["javascript","databases","php"],"title":"Physics Department Joomla research component","type":"project"},{"authors":null,"categories":["pymc"],"content":"As part of pymc-devs, I work on the development and maintenance of PyMC3, a python package for probabilistic programming and MCMC.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":-62135596800,"objectID":"8e9d3d29d071480eecc2210f1cdbebf7","permalink":"https://lucianopaz.github.io/project/pymc3/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/project/pymc3/","section":"project","summary":"As part of pymc-devs, I work on the development and maintenance of PyMC3, a python package for probabilistic programming and MCMC.","tags":["pymc","ppl","machine learning"],"title":"PyMC3","type":"project"},{"authors":null,"categories":["pymc"],"content":"As part of pymc-devs, I work on the development of PyMC4, a python package for probabilistic programming that works with Tensorflow probability.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":-62135596800,"objectID":"645b788007c6e72babf2f5f5acb52532","permalink":"https://lucianopaz.github.io/project/pymc4/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/project/pymc4/","section":"project","summary":"As part of pymc-devs, I work on the development of PyMC4, a python package for probabilistic programming that works with Tensorflow probability.","tags":["pymc","ppl","machine learning"],"title":"PyMC4","type":"project"},{"authors":null,"categories":["neuroscience"],"content":"Neural population model of rat primary somatosensory cortex and how it drives the fealing of the passage of time. This is an ongoing project at the Tactile perception and learning lab at SISSA, Trieste.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":-62135596800,"objectID":"4f88bc8d5c86cb7d1cc241a2c3c35e21","permalink":"https://lucianopaz.github.io/project/sensosy_processing_in_rats/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/project/sensosy_processing_in_rats/","section":"project","summary":"Neural population model of rat primary somatosensory cortex and how it drives the fealing of the passage of time. This is an ongoing project at the Tactile perception and learning lab at SISSA, Trieste.","tags":["pymc","machine learning","neuroscience"],"title":"Sensory processing in rats","type":"project"}] \ No newline at end of file +[{"authors":["lucianopaz"],"categories":null,"content":"I\u0026rsquo;m currently a machine learning engineer working at Innova SpA in Trieste, Italy.\nIn a past life I worked researching computational neuroscience. During that time, I mostly built computational and mathematical models for behavior ranging from simple perceptual tasks in rodents to complex planning in humans.\nI\u0026rsquo;ve worked with diverse tools and frameworks such as bayesian inference, machine learning, artificial intelligence, and stochastic dynamics.\nI\u0026rsquo;m also part of the development team of the pymc probabilistic programming packages written in python.\n","date":-62135596800,"expirydate":-62135596800,"kind":"taxonomy","lang":"en","lastmod":-62135596800,"objectID":"cd61adf50213a72955c4ccbeccb26c34","permalink":"https://lucianopaz.github.io/authors/lucianopaz/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/authors/lucianopaz/","section":"authors","summary":"I\u0026rsquo;m currently a machine learning engineer working at Innova SpA in Trieste, Italy.\nIn a past life I worked researching computational neuroscience. During that time, I mostly built computational and mathematical models for behavior ranging from simple perceptual tasks in rodents to complex planning in humans.\nI\u0026rsquo;ve worked with diverse tools and frameworks such as bayesian inference, machine learning, artificial intelligence, and stochastic dynamics.\nI\u0026rsquo;m also part of the development team of the pymc probabilistic programming packages written in python.","tags":null,"title":"Luciano Paz","type":"authors"},{"authors":null,"categories":["ppl"],"content":" This post is intended to explain:\n What the shape attribute of a pymc3 RV is. What\u0026rsquo;s the difference between an RV\u0026rsquo;s and its associated distribution\u0026rsquo;s shape. How does a distribution\u0026rsquo;s shape determine the shape of its logp output. The potential trouble this can bring with samples drawn from the prior or from the posterior predictive distributions. The many implicit semantically relevant shapes associated with pymc3 models. We will follow the naming convention used by tensorflow probability and talk about sample, batch and event shapes. I plan to write a separate post with some best practices to avoid falling into the many circles of shape hell (or at least be able to maneuver inside them). The origin of all evil ambiguity Before going into the depths of PyMC3, we\u0026rsquo;ll take a detour around numpy\u0026rsquo;s random number generators to introduce the many semantically relevant shapes involved with random variables, samples drawn from probability distributions and how these affect the computed probability densities.\nnumpy has no concept of a random variable class, it provides some random number generators from which you can draw some samples that follow a particular probability distribution (I\u0026rsquo;m purposely not mentioning scipy.stats here). Lets draw some numbers using numpy.random.normal:\nimport numpy as np np.random.seed(12345) # Lets make a call using all the defaults x = np.random.normal() print(\u0026quot;Sample's value = {}\\nSample's shape = {}\u0026quot;.format(x, np.array(x).shape)) Sample's value = -0.20470765948471295 Sample's shape = () You can see that if we call the random number generator with empty arguments we get a single number, or more technically a scalar. We actually have to make a zero-sized array by wrapping the result in np.array(x) to be able to get a shape attribute out of it.\nSupplying the size argument we can control the number of draws we want to get. Well, we actually control the shape of the output array.\n# Lets make a call asking for a particular size x = np.random.normal(size=10) print(\u0026quot;Sample's value = {}\\nSample's shape = {}\u0026quot;.format(x, x.shape)) Sample's value = [ 0.47894334 -0.51943872 -0.5557303 1.96578057 1.39340583 0.09290788 0.28174615 0.76902257 1.24643474 1.00718936] Sample's shape = (10,) # Lets make a call asking for a tuple size x = np.random.normal(size=(10, 4, 2)) x.shape (10, 4, 2) Ok, so we control the number of draws we want to make using the size argument. The result of a call to the random number generator can return an ndarray or a scalar. But what happens if we change the distribution parameters, loc and scale?\n# Lets make a call supplying an array of locs x = np.random.normal(loc=np.arange(10)) x.shape (10,) We now also get an ndarray instead of a scalar! We get as many numbers as elements in the supplied loc distribution parameter. So what does size do now?\n# Lets make a call supplying an array of locs and asking for a given size x = np.random.normal(loc=np.arange(10), size=10) x.shape (10,) Wait, so what did size do here? The answer is essentially nothing!\nDistribution\u0026rsquo;s shape What is actually happening is that the distribution parameters, like loc and scale of a normal, implicitly define what we could call the distribution\u0026rsquo;s shape (when we get to multivariate distributions we will split this into batch_shape and event_shapes as in tensorflow probability, but for now lets stick to distribution_shape). The random number generator draws samples using the distribution\u0026rsquo;s implicit shape, but also vectorizes the random calls over the extra dimensions provided in size. If the distribution cannot broadcast with the supplied size, then an error is raised (it\u0026rsquo;s actually more pedantic than simple broadcasting).\n# size and loc broadcast so the random draws are vectorized over the extra dimensions of size x = np.random.normal(loc=np.arange(10), size=(300, 10)) x.shape (300, 10) # Each column has a different loc np.mean(x, axis=0) array([0.01963069, 0.95968487, 2.02679069, 2.99794185, 4.00357886, 4.98255339, 6.08488072, 6.895649 , 8.07455776, 9.02202938]) # size does not broadcast with loc, and an error is raised x = np.random.normal(loc=np.arange(10), size=3) x.shape --------------------------------------------------------------------------- ValueError Traceback (most recent call last) \u0026lt;ipython-input-9-6ec74377ffcd\u0026gt; in \u0026lt;module\u0026gt;() 1 # size does not broadcast with loc, and an error is raised ----\u0026gt; 2 x = np.random.normal(loc=np.arange(10), size=3) 3 x.shape mtrand.pyx in mtrand.RandomState.normal() mtrand.pyx in mtrand.cont2_array() ValueError: shape mismatch: objects cannot be broadcast to a single shape # size and loc broadcast so the random draws are vectorized over the extra dimensions of size x = np.random.normal(loc=np.arange(10)[:, np.newaxis], size=(10, 300)) x.shape (10, 300) # Each row has a different loc np.mean(x, axis=1) array([0.02937006, 0.98651717, 1.9862869 , 3.00312987, 4.017759 , 4.92210904, 6.01890166, 7.0359308 , 7.98685293, 8.88494283]) # size and loc broadcast but the non singleton dimensions in loc do not match # with the correspoding size element so an error is raised x = np.random.normal(loc=np.arange(10)[:, np.newaxis], size=(1, 300)) x.shape --------------------------------------------------------------------------- ValueError Traceback (most recent call last) \u0026lt;ipython-input-12-1b49ca5414a0\u0026gt; in \u0026lt;module\u0026gt;() 1 # size and loc broadcast but the non singleton dimensions in loc do not match 2 # with the correspoding size element so an error is raised ----\u0026gt; 3 x = np.random.normal(loc=np.arange(10)[:, np.newaxis], size=(1, 300)) 4 x.shape mtrand.pyx in mtrand.RandomState.normal() mtrand.pyx in mtrand.cont2_array() ValueError: size is not compatible with inputs # The distribution's implictly defined shape depends on how the parameters # loc and scale broadcast with each other x = np.random.normal( loc=np.arange(3).reshape((3, 1)), scale=np.arange(4) + 1, size=(3000, 3, 4) ) x.shape (3000, 3, 4) # The second axis has different means and the last axis has different standard deviations np.mean(x, axis=0) array([[ 0.0102261 , -0.00975594, -0.0082977 , 0.11634068], [ 0.9840894 , 0.94276502, 0.93028859, 0.968283 ], [ 2.00062043, 1.99421095, 2.11480227, 1.83806223]]) np.std(x, axis=0) array([[0.98598852, 2.02771629, 2.98777904, 3.94358108], [1.01194307, 2.00820371, 3.0372083 , 3.94933656], [1.00660898, 2.02210172, 3.06703822, 3.98733117]]) Pointers that we need to understand so far: Distributions have parameters. These parameters can implicitly determine the distribution\u0026rsquo;s shape. Samples drawn from a distribution are ndarrays that have some shape which is controlled with the size of the draw and the distribution\u0026rsquo;s implicit shape. Any operation that is performed on a sample drawn from a probability distribution does not actually interact with the distribution\u0026rsquo;s shape, but with something that is determined by size along with the distribution\u0026rsquo;s shape. The word shape is shamelessly used all over the place! This is maybe the prime source of confusion in all things shape related. Multivariate distributions Up until now we only dealt with the normal distribution\u0026rsquo;s random number generator. This is a scalar distribution. What this means is that a single sample drawn from a scalar distribution is also a scalar. On the other hand, multivariate distributions, such as the Multinomial, Multivariate Normal and Wishart, amongst others, yield vectors or higher ranked tensors as their single samples. To make this more precise, we will to refer to a single sample yielded by a single distribution as an event, and its shape will be called the event_shape (we call them events because we follow tensorflow probability\u0026rsquo;s naming convention for shapes).\n Scalar distributions have an event_shape == (). Multivariate distributions have non empty event_shapes. But what happened when we passed a loc vector to the normal distribution? Did we somehow make a multivariate distribution implicitly? The answer is no, we did not make a multivariate distribution, we made a batch of many scalar distribution. If you\u0026rsquo;re confused, don\u0026rsquo;t worry. The difference between a batch and a multivariate distribution is not clear by just looking at the shapes of samples drawn from them. For example, if we have a batch of two scalar normal distributions, and compare it to a bivariate normal distribution, both have the same distribution shape, (2,). The difference will only become clear when we look at how the distribution\u0026rsquo;s logp is computed. To do this, we\u0026rsquo;ll rely on scipy.stats which provides a uniform API for many probability distributions and there logpdf methods.\nfrom scipy import stats loc = np.arange(2) scale = np.ones(2) observed = np.arange(2) batch_logp = stats.norm.logpdf(observed, loc, scale) event_logp = stats.multivariate_normal.logpdf(observed, loc, scale) print(\u0026quot;Batch of normals logp shape = {}\u0026quot;.format(batch_logp.shape)) print(\u0026quot;Multivariate normal logp shape = {}\u0026quot;.format(event_logp.shape)) print(\u0026quot;Batch of normals total logp = {}\u0026quot;.format(np.sum(batch_logp))) print(\u0026quot;Multivariate normal total logp = {}\u0026quot;.format(np.sum(event_logp))) Batch of normals logp shape = (2,) Multivariate normal logp shape = () Batch of normals total logp = -1.8378770664093453 Multivariate normal total logp = -1.8378770664093453 The distribution\u0026rsquo;s logp measures the probability density (or mass for discrete distributions) that a sample was drawn from it. For the batch of normals, each element in the batch shape corresponds to an independent observation that should be measured according to an independent distribution in the batch. For this reason, we get (2,) numbers returned by the batch of gaussian distributions. On the other hand, the bivariate gaussian has event_shape == (2,). The entire axis of the event_shape is consumed with the logp calculation.\nGenerally, it is possible to write a batch of multivariate distributions. The resulting distribution shape is the combination of the batch and event shapes.\nAs a final summary on the shape semantics: The event_shape is the shape of a single draw from the unbatched distribution. For scalar distributions, this is an empty tuple (). The batch_shape describes the shape of a draw from a batch of many distributions (like what we did with the normal draws when we supplied many loc and scales). The sample_shape describes the shape of the independent, identically distributed (IID) draws made from a distribution or batch of distributions. A distribution\u0026rsquo;s shape is made up by a batch shape and an event shape: distribution_shape == batch_shape + event_shape. When you draw random samples from a distribution, the resulting array has the following shape: sample_shape + batch_shape + event_shape When you compute the logp of an observed sample, the event part of the shape is consumed and the output\u0026rsquo;s shape will be sample_shape + batch_shape. The three shape related concepts are a bit complicated to grasp so I recommend that you read Eric Ma\u0026rsquo;s very nice explanation on them or just look at TFP\u0026rsquo;s colab on them.\nShapes in PyMC3 PyMC3 is intended to be a user friendly modeling package in python. It aims to make probabilistic programming easy by providing an intuitive syntax to describe data generative processes and out-of-the-box Markov Chain Monte Carlo (MCMC) methods to make inferences from observed data. It is designed with inference in mind. It uses theano as its computational backend to be able to express mathematical operations on random variables symbolically. pymc3\u0026rsquo;s design has lead to a lot of ambiguity on the what shape actually means, and how it affects sampling under different circumstances. The main reasons for the shape ambiguity in pymc3 are:\n The three shape concepts (sample, batch and event shapes) are never explicitly disambiguated in the Distribution nor in the RV classes. There is no distinction between batch and event shapes, users can only set the distribution\u0026rsquo;s shape, which is an implicit combination of both. We deal with theano.tensor that represent mathematical operations over random variates. These operations hide most of the semantical batch and event shapes under a single distribution shape in the best case, and as a symbolic shape that is unknown until run time in the worst case. The mathematical operations may work well for the model definition when one only needs to pm.sample, but might run into shape problems when running pm.sample_prior_predictive and pm.sample_posterior_predictive. Model logp reduces the random variable\u0026rsquo;s logp outputs using theano.tensor.sum so the sample-batch shape information is lost for sampling. The shape of ObservedRVs is partly set by the observed argument, but this is not equivalent to the distribution\u0026rsquo;s shape, leading to an ambiguous and confusing situation for users when they define their models, and potential errors when using sample_prior_predictive. The distribution\u0026rsquo;s batch and event shape is not inferred from the supplied parameters, one must always specify the distribution\u0026rsquo;s shape during model construction. Random variable\u0026rsquo;s shapes and problems with sampling To highlight some of these complications, we have to look at data generative models. Random variables are used to define data generation processes like the following:\n$$ \\beta_{G} \\sim \\mathcal{N}(0, 1)\\\\\n\\gamma \\sim \\mathcal{\\Gamma}(10, 10)\\\\\nz_{i} \\sim \\mathcal{N}(0, 1)\\\\\n\\beta_{i} = \\beta_{G} + \\gamma z_{i}\\\\\n\\sigma \\sim \\mathcal{\\Gamma}(3, 3)\\\\\ny \\sim \\mathcal{N}\\left(\\vec{X}. \\vec{\\beta}, \\sigma\\right) $$\nThis is a simple hierarchical model usually used for linear regressions with partial pooling. The key pieces of information for end users are: $y$ and $\\vec{X}$. These are the actually observed values. $y$ represents an observed output of the assumed data generation process, and $\\vec{X}$ would be the corresponding input. We assume $\\vec{X}$ is a vector because it can have more than one feature that is used to generate the output observation. In general, we have more than a single observation, in thoses cases $\\vec{y}$ is a vector (or 1D array) and $\\mathbb{X}$ is a matrix (or 2D array, made from vertically stacking many $\\vec{X}$).\nThe rest of the parameters define the unobserved random variables that we will try to infer given the set of observations. $\\sigma$ is the observation noise. $\\vec{\\beta}$ are the regression weights, and they are a summation of two other sets of random variables: $\\beta_{G}$, $\\gamma$ and $\\vec{z}$.\nWhen we do MCMC, we propose candidate values for the random variables based on the model\u0026rsquo;s logp. We use theano and our RVs are essentially placeholders or Variables whose values are unknown when the model is defined, but are updated during sampling. The only thing that is fixed during the model\u0026rsquo;s construction is what we will call the distribution\u0026rsquo;s core dimensions, ndim, which is a fancy way of calling the axes defined by batch_shape and event_shape. MCMC sampling in pymc3 operates on single samples from distributions and only the logp shape handling of ObservedRVs can bring shape related trouble (like what happened in this issue with Categorical.logp).\nOn the other hand, when we do forward sampling (i.e. sample_prior_predictive and sample_posterior_predictive) the same data generation model that was defined only taking into account the core dimensions, has to be used with the extra dimensions given by sample_shape. This leads to a whole world of pain which we will illustrate now using only numpy to write the data generation model:\n# This implementation uses the argument size incorrectly def unvectorized_prior_sampler(X=np.arange(3), size=None): normal = np.random.normal beta_g = normal(0, 1, size=size) gamma = np.random.gamma(10, 10, size=size) z = normal(np.zeros(X.shape[-1]), 1, size=size) beta = beta_g + gamma * z sigma = np.random.gamma(3, 3, size=size) return normal(np.dot(X, beta), sigma, size=size) So lets try out our nice little unvectorized_prior_sampler using only the default size=None. This means that\n We will be drawing a single sample from each distribution Said sample will have the distribution\u0026rsquo;s implicit shape All mathematical operations will be performed on the resulting floats or ndarrays that are returned as the distributions samples.\nfrom matplotlib import pyplot as plt import seaborn as sns plt.ion() sns.distplot([unvectorized_prior_sampler() for _ in range(5000)]); Nice! Everything works! Now lets try to do the same but taking advantage of size to draw multiple values at once and vectorize operations\nunvectorized_prior_sampler(size=5000) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) \u0026lt;ipython-input-20-30082f0e1390\u0026gt; in \u0026lt;module\u0026gt;() ----\u0026gt; 1 unvectorized_prior_sampler(size=5000) \u0026lt;ipython-input-17-2bc805c3fbae\u0026gt; in unvectorized_prior_sampler(X, size) 4 beta_g = normal(0, 1, size=size) 5 gamma = np.random.gamma(10, 10, size=size) ----\u0026gt; 6 z = normal(np.zeros(X.shape[-1]), 1, size=size) 7 beta = beta_g + gamma * z 8 sigma = np.random.gamma(3, 3, size=size) mtrand.pyx in mtrand.RandomState.normal() mtrand.pyx in mtrand.cont2_array() ValueError: shape mismatch: objects cannot be broadcast to a single shape Oh no! We can\u0026rsquo;t use the same size argument for drawing from every distribution, because it doesn\u0026rsquo;t broadcast well with the implicit distribution\u0026rsquo;s shape.\nWhat we want to do is set the sample_shape instead, which is agnostic of each distribution\u0026rsquo;s shape, so will try to work around the distribution_shape.\n# This implementation uses the argument sample_shape incorrectly def incorrect_vectorized_prior_sampler(X=np.arange(3), sample_shape=None): normal = np.random.normal if sample_shape is None: sample_shape = () try: sample_shape = tuple(sample_shape) except TypeError: sample_shape = (sample_shape,) beta_g = normal(0, 1, size=sample_shape) gamma = np.random.gamma(10, 10, size=sample_shape) # We add the distribution's shape at the end of sample_shape z = normal(np.zeros(X.shape[-1]), 1, size=sample_shape + (X.shape[-1],)) beta = beta_g + gamma * z sigma = np.random.gamma(3, 3, size=sample_shape) return normal(np.dot(X, beta), sigma, size=sample_shape) incorrect_vectorized_prior_sampler(sample_shape=5000) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) \u0026lt;ipython-input-22-235899f566bb\u0026gt; in \u0026lt;module\u0026gt;() ----\u0026gt; 1 incorrect_vectorized_prior_sampler(sample_shape=5000) \u0026lt;ipython-input-21-f4692868adc5\u0026gt; in incorrect_vectorized_prior_sampler(X, sample_shape) 12 # We add the distribution's shape at the end of sample_shape 13 z = normal(np.zeros(X.shape[-1]), 1, size=sample_shape + (X.shape[-1],)) ---\u0026gt; 14 beta = beta_g + gamma * z 15 sigma = np.random.gamma(3, 3, size=sample_shape) 16 return normal(np.dot(X, beta), sigma, size=sample_shape) ValueError: operands could not be broadcast together with shapes (5000,) (5000,3) Still not working! Lets zoom-in to what is causing the error. We see that we are summing beta_g\u0026rsquo;s samples with the product of gamma and z\u0026rsquo;s samples. Let\u0026rsquo;s look at the shapes of the samples:\n beta_g.shape == sample_shape gamma.shape == sample_shape z.shape == sample_shape + (X.shape[-1],) So in our particular examples these are shaped as:\n beta_g.shape == (5000,) gamma.shape == (5000,) z.shape == (5000, 3) So it makes sense that the product of gamma with z does not broadcast. You could ask yourself, why did this work at all when we call this function with sample_shape=None? The answer is that a single sample from each distribution (calling with sample_shape=()), has the distribution\u0026rsquo;s implicit shape, and those DO BROADCAST CORRECTLY:\n beta_g.shape == () gamma.shape == () z.shape == (3,) To make our model\u0026rsquo;s data generation process fully vectorized we need two things:\n To ensure that all the mathematical operations (sums, products, dots) and all the conditional dependences between variables (like $y$ and $\\vec{\\beta}$) work on the distribution\u0026rsquo;s core dimensions (batch_shape + event_shape). We choose this name to stick with numpy\u0026rsquo;s glossary of generalized universal functions. That the mathematical operations be vectorized on the extra dimensions added by the requested sample_shape (something like what is done with numpy\u0026rsquo;s generic ufuncs).\n# Overly expressive but _almost_ fully vectorized implementation def vectorized_prior_sampler(X=np.arange(3), sample_shape=None): normal = np.random.normal if sample_shape is None: sample_shape = () try: sample_shape = tuple(sample_shape) except TypeError: sample_shape = (sample_shape,) beta_g = normal(0, 1, size=sample_shape) gamma = np.random.gamma(10, 10, size=sample_shape) # We add the distribution's shape at the end of sample_shape z = normal(np.zeros(X.shape[-1]), 1, size=sample_shape + (X.shape[-1],)) beta = beta_g[..., None] + gamma[..., None] * z sigma = np.random.gamma(3, 3, size=sample_shape) return normal(np.einsum(\u0026quot;...i,...i\u0026quot;, X, beta), sigma, size=sample_shape) sns.distplot(vectorized_prior_sampler(sample_shape=5000)); Things can still go horribly wrong if we pass a stack of multiple X vectors. This is a big deal because it happens when we set more than a single observed X and y.\nX = np.tile(np.arange(3).reshape((1, 3)), (10, 1)) vectorized_prior_sampler(X=X, sample_shape=5000) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) \u0026lt;ipython-input-25-f754aa0fb06e\u0026gt; in \u0026lt;module\u0026gt;() 1 X = np.tile(np.arange(3).reshape((1, 3)), (10, 1)) ----\u0026gt; 2 vectorized_prior_sampler(X=X, sample_shape=5000) \u0026lt;ipython-input-23-5506fae2796b\u0026gt; in vectorized_prior_sampler(X, sample_shape) 14 beta = beta_g[..., None] + gamma[..., None] * z 15 sigma = np.random.gamma(3, 3, size=sample_shape) ---\u0026gt; 16 return normal(np.einsum(\u0026quot;...i,...i\u0026quot;, X, beta), sigma, size=sample_shape) ~/anaconda3/lib/python3.6/site-packages/numpy/core/einsumfunc.py in einsum(*operands, **kwargs) 1067 # If no optimization, run pure einsum 1068 if optimize_arg is False: -\u0026gt; 1069 return c_einsum(*operands, **kwargs) 1070 1071 valid_einsum_kwargs = ['out', 'dtype', 'order', 'casting'] ValueError: operands could not be broadcast together with remapped shapes [original-\u0026gt;remapped]: (10,3)-\u0026gt;(10,3) (5000,3)-\u0026gt;(5000,3) To solve this, we need to make our einsum aware of the extra observation dimensions in X, and then ensure that the result broadcasts with samples taken from sigma.\n# fully vectorized implementation def prior_sampler(X=np.arange(3), sample_shape=None): normal = np.random.normal if sample_shape is None: sample_shape = () try: sample_shape = tuple(sample_shape) except TypeError: sample_shape = (sample_shape,) beta_g = normal(0, 1, size=sample_shape) gamma = np.random.gamma(10, 10, size=sample_shape) # We add the distribution's shape at the end of sample_shape z = normal(np.zeros(X.shape[-1]), 1, size=sample_shape + (X.shape[-1],)) beta = beta_g[..., None] + gamma[..., None] * z sigma = np.random.gamma(3, 3, size=sample_shape) if X.ndim == 1: signature = \u0026quot;...i,...i\u0026quot; else: signature = \u0026quot;...{0}i,...i-\u0026gt;...{0}\u0026quot;.format(\u0026quot;\u0026quot;.join([chr(106 + dim) for dim in range(X.ndim - 1)])) sigma = np.reshape(sigma, sigma.shape + (1,) * (X.ndim - 1)) return normal(np.einsum(signature, X, beta), sigma, size=sample_shape + (X.shape[:-1])) X = np.tile(np.arange(3).reshape((1, 3)), (10, 1)) prior = prior_sampler(X=X, sample_shape=5000) colors = [plt.get_cmap(\u0026quot;rainbow\u0026quot;)(x) for x in np.linspace(0, 1, len(X))] plt_x = np.linspace( np.floor(np.min(prior) / 1) * 1, np.ceil(np.max(prior) / 1) * 1, 1000 ) for offset, (p, color) in enumerate(zip(prior.T, colors)): plt_y = stats.gaussian_kde(p)(plt_x) plt.plot(plt_x - offset * 100, plt_y + offset * 0.001, color=color) plt.axis(\u0026quot;off\u0026quot;); Summary All of this data generation process using raw numpy was intended to showcase that:\n When you write down a model and only think about the distribution\u0026rsquo;s core dimensions, this will in general make it difficult to generate data (sample the prior or posterior predictive) in a vectorized fashion. To ensure proper vectorization, the mathematical operations that act on RVs must work on the core dimensions and be vectorized on the extra axes. The RVs that are themselves parameters of other distributions downstream will have an extra sample_shape axes when generating data. This means that the distribution parameters must be broadcastable excluding the sample_shape parts. pymc3 does most of this for you in sample_prior_predictive and sample_posterior_predictive. When we write down a model in pymc3, we can easily identify the distribution core dimensions. This allows us to vectorize the mathematical operations using numpy.vectorize with an adequate signature. Furthermore, the distribution parameters are broadcasted ignoring the sample_shape part thanks to the utilities defined in the pymc3.distributions.shape_utils module. However, at the moment this broadcasting works for scalar distributions only, the multivariate distributions are still not fully vectorized.\nPyMC3 shape model construction Before looking at the equivalent data generation model from above but written in pymc3, I\u0026rsquo;ll briefly remind you of three major things that one should always do with shapes in pymc3 models:\n When we create random variables inside a model, their shape cannot be implicitly inferred from the distribution parameters. This means that if we want to have multidimensional or batches of distributions we need to specify the desired shape during the distribution\u0026rsquo;s instantiation. When we create a random variable and specify the shape, we are not setting the random variable\u0026rsquo;s shape, we are setting the distribution\u0026rsquo;s shape and the distribution\u0026rsquo;s core dimensions. When a random variable is given observations, the ObservedRV\u0026rsquo;s shape (not the underlying distribution\u0026rsquo;s shape) is determined by the observed argument. At the moment, there are no checks in place between the ObservedRV and the underlying distribution\u0026rsquo;s shapes, but in the near future, an error will be raised if the ObservedRV\u0026rsquo;s shape is not consistent with a sample, with a given sample_shape, drawn from the underlying distribution. Handling observations is really important in pymc3, so I\u0026rsquo;ll stop here a bit because there are a lot of ambiguous things revolving around shapes. Essentially ObservedRVs have an observed attribute that represent the data that they have observed. This data should have a shape that looks like sample_shape + distribution_shape. The distribution\u0026rsquo;s shape should always be provided, or it will be assumed to represent a scalar RV (distribution_shape=() and hence zero core dimensions). When we draw samples from the posterior predictive distribution, we let the ObservedRVs shape be treated as the distribution\u0026rsquo;s shape. When we draw samples from the prior predictive distribution, we want to ignore the ObservedRV\u0026rsquo;s shape and just work with the underlying distribution\u0026rsquo;s shape (this is currently not working due to an old open issue).\nNow lets define our model in pymc3\nimport numpy as np import theano from theano import tensor as tt import pymc3 as pm # Model definition is encapsulated to easily use with or without observations def model_factory(X=np.arange(3), y=None): with pm.Model() as model: beta_g = pm.Normal(\u0026quot;beta_g\u0026quot;, mu=0, sigma=1) gamma = pm.Gamma(\u0026quot;gamma\u0026quot;, alpha=10, beta=10) z = pm.Normal(\u0026quot;z\u0026quot;, mu=0, sigma=1, shape=X.shape[1]) sigma = pm.Gamma(\u0026quot;sigma\u0026quot;, alpha=3, beta=3) beta = beta_g + gamma * z y = pm.Normal( \u0026quot;y\u0026quot;, mu=tt.dot(X, beta), sigma=sigma, shape=X.shape[:-1], observed=y ) return model # Data generation X = np.random.randn(400, 3) X[:, 0] = 1. with model_factory(X): prior = pm.sample_prior_predictive(samples=1) y = prior.pop(\u0026quot;y\u0026quot;) true_parameters = prior print(true_parameters) # Inference with model_factory(X, y) as model: trace = pm.sample() pm.plot_posterior(trace) /home/lpaz/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters {'sigma': array(0.86392362), 'z': array([ 0.58161754, -0.92677227, 0.60365752]), 'gamma': array(0.90931996), 'sigma_log__': -0.1462709159678767, 'beta_g': array(-1.14124179), 'gamma_log__': -0.09505825242743682} Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (2 chains in 2 jobs) NUTS: [sigma, z, gamma, beta_g] Sampling 2 chains, 0 divergences: 100%|██████████| 2000/2000 [00:16\u0026lt;00:00, 124.98draws/s] The number of effective samples is smaller than 25% for some parameters. The resulting trace seems mostly fine, the posterior densities are around where they should be, and the model seems to have converged well enough (I don\u0026rsquo;t show the checks here to avoid clobbering the post). Lets just pay attention to the shape parameters that I passed along in the model\u0026rsquo;s definition.\n I wrote a factory function to be able to easily create a model I\u0026rsquo;ll use without observations to use sample_prior_predictive to generate a set of observations. Then, I create a second model passing the observed data to do inference. All the distributions that are not scalar have a specified shape, including the observed y. This is because, the distribution shape cannot be inferred properly from the distribution parameters. We can view some of the differences between RV and distribution shapes with this model:\ndef rv_shape_info(model, rv): try: shape = \u0026quot;is a symbolic theano expression that evaluates to {}\u0026quot;.format(rv.shape.eval()) except: shape = \u0026quot;is a symbolic theano expression, unknown until runtime\u0026quot; return ( \u0026quot;RV name = {}.\\n\u0026quot; \u0026quot;Type of RV = {}.\\n\u0026quot; \u0026quot;RV shape {}.\\n\u0026quot; \u0026quot;Distribution shape = {}\\n\u0026quot; \u0026quot;logp shape = {}.\\n\u0026quot;. format( rv.name, type(rv), shape, rv.distribution.shape, rv.logp_elemwise(model.test_point).shape ) ) beta_g = model.beta_g print(rv_shape_info(model, beta_g)) RV name = beta_g. Type of RV = \u0026lt;class 'pymc3.model.FreeRV'\u0026gt;. RV shape is a symbolic theano expression, unknown until runtime. Distribution shape = [] logp shape = (). z = model.z print(rv_shape_info(model, z)) RV name = z. Type of RV = \u0026lt;class 'pymc3.model.FreeRV'\u0026gt;. RV shape is a symbolic theano expression, unknown until runtime. Distribution shape = [3] logp shape = (3,). y = model.y print(rv_shape_info(model, y)) RV name = y. Type of RV = \u0026lt;class 'pymc3.model.ObservedRV'\u0026gt;. RV shape is a symbolic theano expression that evaluates to [400]. Distribution shape = [400] logp shape = (400,). Shape handling in logp You may have noticed that I printed something I called logp shape. Lets look at what that is. Each Distribution has a logp method with the following signature:\n... def logp(self, value): ... It takes an input value and returns a theano expression that can later be compiled to compute the logarithm of the probability density function of the distribution, conditioned on the value input. When an RV is instantiated, the distribution\u0026rsquo;s logp is called with one of two possible inputs:\n If no observed is provided, logp is called passing the RV as the value argument If observed is provided, logp is called passing observed as the value argument Keep in mind that this simply returns a theano.tensor, called logp_elemwiset, so it still has no value, and hence no shape to check. pymc3 later provides the RVs with two convenience methods to compute logp numerical values by compiling the returned theano.tensor into a function. The first is the widely known rv.logp method, which computes the sum of the logp_elemwiset on a given point input. The second, less known method, is rv.logp_elemwise, which simply returns the numerical output of logp_elemwiset on a given point input. This means that we can use rv.logp_elemwise to check how a given distribution handles the logp\u0026rsquo;s output shape\nIf you recall the discussion on event_shape and batch_shape, we can compare the logps of a batch of normals versus a multivariate normal on a given set of observations\ndef model_factory(batch_shape, event_shape, batch_obs=None, event_obs=None): with pm.Model() as model: batch = pm.Normal( \u0026quot;batch\u0026quot;, mu=0, sigma=1, shape=batch_shape, observed=batch_obs, ) event = pm.MvNormal( \u0026quot;event\u0026quot;, mu=np.zeros(event_shape), cov=np.diag(np.ones(event_shape)), shape=event_shape, observed=event_obs, ) return model batch_shape = (3,) event_shape = (3,) sample_shape = (4,) # We generate data to later be used as observations test_model = model_factory(batch_shape, event_shape) b = test_model.batch.random(size=sample_shape) e = test_model.event.random(size=sample_shape) # We create the model passing observations to the RVs so that we can compute logp test_model = model_factory(batch_shape, event_shape, batch_obs=b, event_obs=b) point = {\u0026quot;batch\u0026quot;: b, \u0026quot;event\u0026quot;: e} blp = test_model.batch.logp_elemwise(point) elp = test_model.event.logp_elemwise(point) print(\u0026quot;Shape of draw from a batch of normals = {}\u0026quot;.format(b.shape)) print(\u0026quot;Shape of draw from an MvNormal = {}\u0026quot;.format(e.shape)) print(\u0026quot;Shape of logp output from a batch of normals = {}\u0026quot;.format(blp.shape)) print(\u0026quot;Shape of logp output from an MvNormal = {}\u0026quot;.format(elp.shape)) print(\u0026quot;Total logp output from a batch of normals = {}\u0026quot;.format(np.sum(blp))) print(\u0026quot;Total logp output from an MvNormal = {}\u0026quot;.format(np.sum(elp))) Shape of draw from a batch of normals = (4, 3) Shape of draw from an MvNormal = (4, 3) Shape of logp output from a batch of normals = (4, 3) Shape of logp output from an MvNormal = (4,) Total logp output from a batch of normals = -22.535408832791916 Total logp output from an MvNormal = -22.53540883279192 The previous example highlights the following:\n event_shape and batch_shape are implicit to a distribution, and only accessible through a single shape argument. The logp computations distinguish between the event_shape and batch_shape of distributions, but drawing samples from distributions does not. Posterior predictive sampling We can now draw samples from the posterior predictive distribution and check the output\u0026rsquo;s shape. What we do at the moment is:\n We iterate through the points in the trace one at a time. This means that the RV values provided will only have their core dimensions. This makes forward sampling safer but slower. We use the ObservedRV\u0026rsquo;s shape as the distribution_shape, so all samples should have shape=(number_of_points_in_trace,) + observedRV.shape.eval().\nprint(\u0026quot;Total number of points in trace = {}\u0026quot;.format(len([p for p in trace.points()]))) Total number of points in trace = 1000\nwith model: ppc = pm.sample_posterior_predictive(trace) ppc[\u0026quot;y\u0026quot;].shape 100%|██████████| 1000\u0026frasl;1000 [00:01\u0026lt;00:00, 829.93it/s]\n(1000, 400)\n To better highlight the shape difference in sampling from an ObservedRVs prior distribution and its posterior predictive we\u0026rsquo;ll use the following model:\nwith pm.Model() as model: mu = pm.Normal(\u0026quot;mu\u0026quot;, 0, 1) obs = pm.Normal(\u0026quot;obs\u0026quot;, mu, 1, shape=(3,), observed=np.random.randn(10, 3) + 1) prior = obs.distribution.random(size=100) trace = pm.sample() ppc = pm.sample_posterior_predictive(trace, samples=100) print(\u0026quot;Shape of samples drawn from the prior distribution = {}\u0026quot;.format(prior.shape)) print(\u0026quot;Shape of samples drawn from the posterior predictive = {}\u0026quot;.format(ppc[\u0026quot;obs\u0026quot;].shape)) Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (2 chains in 2 jobs) NUTS: [mu] Sampling 2 chains, 0 divergences: 100%|██████████| 2000/2000 [00:00\u0026lt;00:00, 2665.14draws/s] /home/lpaz/repos/pymc3/pymc3/sampling.py:1109: UserWarning: samples parameter is smaller than nchains times ndraws, some draws and/or chains may not be represented in the returned posterior predictive sample warnings.warn(\u0026quot;samples parameter is smaller than nchains times ndraws, some draws \u0026quot; 100%|██████████| 100/100 [00:00\u0026lt;00:00, 2240.26it/s] Shape of samples drawn from the prior distribution = (100, 3) Shape of samples drawn from the posterior predictive = (100, 10, 3) ","date":1566172800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1566172800,"objectID":"b33bf9180207918ee517121c4dc3a54f","permalink":"https://lucianopaz.github.io/2019/08/19/pymc3-shape-handling/","publishdate":"2019-08-19T00:00:00Z","relpermalink":"/2019/08/19/pymc3-shape-handling/","section":"post","summary":"This post is intended to explain:\n What the shape attribute of a pymc3 RV is. What\u0026rsquo;s the difference between an RV\u0026rsquo;s and its associated distribution\u0026rsquo;s shape. How does a distribution\u0026rsquo;s shape determine the shape of its logp output. The potential trouble this can bring with samples drawn from the prior or from the posterior predictive distributions. The many implicit semantically relevant shapes associated with pymc3 models. We will follow the naming convention used by tensorflow probability and talk about sample, batch and event shapes.","tags":["pymc","python","ppl"],"title":"PyMC3 shape handling","type":"post"},{"authors":null,"categories":["Other"],"content":"Hi everyone, and welcome to the first post of my website! I wanted a place to write about the things I\u0026rsquo;ve worked on, the topics that I find interesting and also some thoughts, notes and code, that I hope that you may find helpful, or at least interesting.\n","date":1564876800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1564876800,"objectID":"bdc92b56d808f6c7e15d98d1710a00e2","permalink":"https://lucianopaz.github.io/2019/08/04/hello-world/","publishdate":"2019-08-04T00:00:00Z","relpermalink":"/2019/08/04/hello-world/","section":"post","summary":"Hi everyone, and welcome to the first post of my website! I wanted a place to write about the things I\u0026rsquo;ve worked on, the topics that I find interesting and also some thoughts, notes and code, that I hope that you may find helpful, or at least interesting.","tags":["other"],"title":"Hello world!","type":"post"},{"authors":null,"categories":["neuroscience"],"content":"I used AI computational models to study human gameplay as part of the Matemarote project. It involved reinforcement learning, and technical skills such as SQL and javascript.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":-62135596800,"objectID":"bc4941250f32ab68b85a4fc819051131","permalink":"https://lucianopaz.github.io/project/behavior_modelling_with_ai/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/project/behavior_modelling_with_ai/","section":"project","summary":"I used AI computational models to study human gameplay as part of the Matemarote project. It involved reinforcement learning, and technical skills such as SQL and javascript.","tags":["artificial intelligence","behavior","javascript","databases","neuroscience"],"title":"Behavior modelling with AI","type":"project"},{"authors":null,"categories":["machine learning"],"content":"A machine learning tool to analyze and visualize geolocation data. It was part of the PHD4PMI initiative. Me and Natalia Grion, worked on this project for Innova SPA.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":-62135596800,"objectID":"d44a8ffc07fd400f68a4d445dc7416da","permalink":"https://lucianopaz.github.io/project/phd4pmi/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/project/phd4pmi/","section":"project","summary":"A machine learning tool to analyze and visualize geolocation data. It was part of the PHD4PMI initiative. Me and Natalia Grion, worked on this project for Innova SPA.","tags":["machine learning","geodata"],"title":"PhD4PMI","type":"project"},{"authors":null,"categories":["other"],"content":"I developed a joomla component for the front and backend of the research section of the physics department of the University of Buenos Aires\u0026rsquo; webpage. It used a combination of php, jQuery, bootstrap, and MySQL.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":-62135596800,"objectID":"092d56ddcc72a3eb4454baf9a1861e6c","permalink":"https://lucianopaz.github.io/project/web_page_df/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/project/web_page_df/","section":"project","summary":"I developed a joomla component for the front and backend of the research section of the physics department of the University of Buenos Aires\u0026rsquo; webpage. It used a combination of php, jQuery, bootstrap, and MySQL.","tags":["javascript","databases","php"],"title":"Physics Department Joomla research component","type":"project"},{"authors":null,"categories":["pymc"],"content":"As part of pymc-devs, I work on the development and maintenance of PyMC3, a python package for probabilistic programming and MCMC.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":-62135596800,"objectID":"8e9d3d29d071480eecc2210f1cdbebf7","permalink":"https://lucianopaz.github.io/project/pymc3/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/project/pymc3/","section":"project","summary":"As part of pymc-devs, I work on the development and maintenance of PyMC3, a python package for probabilistic programming and MCMC.","tags":["pymc","ppl","machine learning"],"title":"PyMC3","type":"project"},{"authors":null,"categories":["pymc"],"content":"As part of pymc-devs, I work on the development of PyMC4, a python package for probabilistic programming that works with Tensorflow probability.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":-62135596800,"objectID":"645b788007c6e72babf2f5f5acb52532","permalink":"https://lucianopaz.github.io/project/pymc4/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/project/pymc4/","section":"project","summary":"As part of pymc-devs, I work on the development of PyMC4, a python package for probabilistic programming that works with Tensorflow probability.","tags":["pymc","ppl","machine learning"],"title":"PyMC4","type":"project"},{"authors":null,"categories":["neuroscience"],"content":"Neural population model of rat primary somatosensory cortex and how it drives the fealing of the passage of time. This is an ongoing project at the Tactile perception and learning lab at SISSA, Trieste.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":-62135596800,"objectID":"4f88bc8d5c86cb7d1cc241a2c3c35e21","permalink":"https://lucianopaz.github.io/project/sensosy_processing_in_rats/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/project/sensosy_processing_in_rats/","section":"project","summary":"Neural population model of rat primary somatosensory cortex and how it drives the fealing of the passage of time. This is an ongoing project at the Tactile perception and learning lab at SISSA, Trieste.","tags":["pymc","machine learning","neuroscience"],"title":"Sensory processing in rats","type":"project"}] \ No newline at end of file