diff --git a/05_function-minimization-with-autograd.Rmd b/05_function-minimization-with-autograd.Rmd index 71ef892..4f2504d 100644 --- a/05_function-minimization-with-autograd.Rmd +++ b/05_function-minimization-with-autograd.Rmd @@ -2,23 +2,110 @@ **Learning objectives:** -- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY +- Apply concepts learned in the previous two chapters -## SLIDE 1 {-} +## An Optimization Classic {.unnumbered} -- ADD SLIDES AS SECTIONS (`##`). -- TRY TO KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF. +**Example**: -## Meeting Videos {-} +*Rosenbrock function*: A function of two variables with minimum at $(a,a^2)$, which lies inside a narrow valley: -### Cohort 1 {-} +$$ +(a- x_1)^2 + b(x_2 - x_1^2)^2 +$$ + +
[![rosenbrock function](images/rosenbrock.png)](https://skeydan.github.io/Deep-Learning-and-Scientific-Computing-with-R-torch/optim_1.html#an-optimization-classic)
+ +Below we set values for `a` and `b` and define the `rosenbrock` function. We expect the minimum of the function to be at $(1,1)$ (when `a=1`). + +```{r} +a <- 1 +b <- 5 + +rosenbrock <- function(x){ + x1 <- x[1] + x2 <- x[2] + (a - x1)^2 + b * (x2 - x1^2)^2 +} +``` + +## Minimization from Scratch {.unnumbered} + +**Goal**: Starting from a point `(x1, x2)` find minimum of the Rosenbrock function. + +**Approach**: Use the function's gradient. + +**Setup**: + +```{r} +library(torch) + +lr <- 0.01 # learning rate +num_interations <- 1000 + +x <- torch_tensor(c(-1, 1), requires_grad = TRUE) +``` + +`x` is the parameter with respect to which we want to compute the function's derivative. Thus, we set `requires_grad = TRUE`. We have arbitrarily chosen `x = (-1, 1)` as a starting point of our search. + +Next we perform the minimization. For each iteration we will: + +1. Compute the value of the `rosenbrock` function at the current value of `x`. + +2. Compute the gradient at `x` (i.e. direction of steepest ascent). + +3. Take a step of size `lr` in the (negative) direction of the gradient. + +4. Repeat. + +A few things to point out about the code below: + +- We use the `with_no_grad()` function. Reason: Because we set `requires_grad = TRUE` in the definition of `x`, torch will include all operations on `x` (including this one) in the derivative calculation, which we don't want. +- Recall from [Chapter 3](https://r4ds.github.io/bookclub-torch/operations-on-tensors.html) that `x$sub_()` (*with an underscore*) will modify the value of `x`. Similarly, `x$grad$zero_()` will also modify `x`. +- We use `x$grad$zero_()` to zero out the `grad` field of `x`. By default, torch accumulates gradients. + +```{r} +for(i in 1:num_interations){ + if(i %% 200 == 0) cat("Iteration: ", i, "\n") + + # Compute value of function: + value <- rosenbrock(x) + if(i %% 200 == 0) cat("Value is: ", as.numeric(value), "\n") + + # Compute the gradient + value$backward() + if(i %% 200 == 0) cat("Gradient is: ", as.matrix(x$grad), "\n\n") + + with_no_grad({ + x$sub_(lr * x$grad) # Take a step of size lr in the (negative) direction of the gradient + x$grad$zero_() # Zero out grad field of x. + }) +} + +``` + +Let's check the value of `x`: + +```{r} +x +``` + +It's close to (1,1) (the true minimum)! + +> Exercise: What kind of difference does the learning rate make? Try `lr=0.001` and `lr=0.1`, respectively. + +## Meeting Videos {.unnumbered} + +### Cohort 1 {.unnumbered} `r knitr::include_url("https://www.youtube.com/embed/URL")`
- Meeting chat log -``` +Meeting chat log + +``` LOG ``` +
diff --git a/images/rosenbrock.png b/images/rosenbrock.png new file mode 100644 index 0000000..3369910 Binary files /dev/null and b/images/rosenbrock.png differ