Chapter 5 (#3)

* Added chapter 5 content * Added link to chapter 3. Added additional formatting. * Final updates for meeting.
r4ds · Oct 25, 2023 · 1e899e8 · 1e899e8
1 parent 5f750cf
commit 1e899e8
Show file tree

Hide file tree

Showing 2 changed files with 95 additions and 8 deletions.
diff --git a/05_function-minimization-with-autograd.Rmd b/05_function-minimization-with-autograd.Rmd
@@ -2,23 +2,110 @@
 
 **Learning objectives:**
 
-- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY
+-   Apply concepts learned in the previous two chapters
 
-## SLIDE 1 {-}
+## An Optimization Classic {.unnumbered}
 
-- ADD SLIDES AS SECTIONS (`##`).
-- TRY TO KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF.
+**Example**:
 
-## Meeting Videos {-}
+*Rosenbrock function*: A function of two variables with minimum at $(a,a^2)$, which lies inside a narrow valley:
 
-### Cohort 1 {-}
+$$
+(a- x_1)^2 + b(x_2 - x_1^2)^2
+$$
+
+<center>[![rosenbrock function](images/rosenbrock.png)](https://skeydan.github.io/Deep-Learning-and-Scientific-Computing-with-R-torch/optim_1.html#an-optimization-classic)</center>
+
+Below we set values for `a` and `b` and define the `rosenbrock` function. We expect the minimum of the function to be at $(1,1)$ (when `a=1`).
+
+```{r}
+a <- 1
+b <- 5
+
+rosenbrock <- function(x){
+  x1 <- x[1]
+  x2 <- x[2]
+  (a - x1)^2 + b * (x2 - x1^2)^2
+}
+```
+
+## Minimization from Scratch {.unnumbered}
+
+**Goal**: Starting from a point `(x1, x2)` find minimum of the Rosenbrock function.
+
+**Approach**: Use the function's gradient.
+
+**Setup**:
+
+```{r}
+library(torch)
+
+lr <- 0.01 # learning rate
+num_interations <- 1000
+
+x <- torch_tensor(c(-1, 1), requires_grad = TRUE)
+```
+
+`x` is the parameter with respect to which we want to compute the function's derivative. Thus, we set `requires_grad = TRUE`. We have arbitrarily chosen `x = (-1, 1)` as a starting point of our search.
+
+Next we perform the minimization. For each iteration we will:
+
+1.  Compute the value of the `rosenbrock` function at the current value of `x`.
+
+2.  Compute the gradient at `x` (i.e. direction of steepest ascent).
+
+3.  Take a step of size `lr` in the (negative) direction of the gradient.
+
+4.  Repeat.
+
+A few things to point out about the code below:
+
+-   We use the `with_no_grad()` function. Reason: Because we set `requires_grad = TRUE` in the definition of `x`, torch will include all operations on `x` (including this one) in the derivative calculation, which we don't want.
+-   Recall from [Chapter 3](https://r4ds.github.io/bookclub-torch/operations-on-tensors.html) that `x$sub_()` (*with an underscore*) will modify the value of `x`. Similarly, `x$grad$zero_()` will also modify `x`.
+-   We use `x$grad$zero_()` to zero out the `grad` field of `x`. By default, torch accumulates gradients.
+
+```{r}
+for(i in 1:num_interations){
+  if(i %% 200 == 0) cat("Iteration: ", i, "\n")
+  
+  # Compute value of function:
+  value <- rosenbrock(x)
+  if(i %% 200 == 0) cat("Value is: ", as.numeric(value), "\n")
+  
+  # Compute the gradient 
+  value$backward()
+  if(i %% 200 == 0) cat("Gradient is: ", as.matrix(x$grad), "\n\n")
+  
+  with_no_grad({
+   x$sub_(lr * x$grad) # Take a step of size lr in the (negative) direction of the gradient
+   x$grad$zero_() # Zero out grad field of x.
+  })
+}
+
+```
+
+Let's check the value of `x`:
+
+```{r}
+x
+```
+
+It's close to (1,1) (the true minimum)!
+
+> Exercise: What kind of difference does the learning rate make? Try `lr=0.001` and `lr=0.1`, respectively.
+
+## Meeting Videos {.unnumbered}
+
+### Cohort 1 {.unnumbered}
 
 `r knitr::include_url("https://www.youtube.com/embed/URL")`
 
 <details>
-<summary> Meeting chat log </summary>
 
-```
+<summary>Meeting chat log</summary>
+
+```         
 LOG
 ```
+
 </details>
diff --git a/images/rosenbrock.png b/images/rosenbrock.png