It will be useful to have a running example for the course.
Here we will conduct inference for a logistic regression model for a binary outcome based on some covariates. Observation
where
Note that I discuss the derivation of this likelihood in detail in a series of blog posts.
Some languages and frameworks can auto-diff likelihoods like this, but we can also differentiate by hand:
For our running example we will use a very simple gradient ascent algorithm in order to try and maximise the likelihood,
We will be analysing the "Pima" training dataset, with 200 observations and 7 predictors. Including an intercept as the first covariate gives a parameter vector of length
For a small dataset like this, there is no problem using the gradient of the full likelihood in a simple steepest ascent algorithm, so that's what we'll start with. But if you are interested in optimisation, you can then go on to experiement with adapting the learning rate, accelerated learning algorithms, using stochastic gradient ascent, etc., according to your interests.