Skip to content

Commit

Permalink
Merge pull request #16 from sebidoe/master
Browse files Browse the repository at this point in the history
missing pi in V of VFA with oracle
  • Loading branch information
mlindauer authored Mar 21, 2021
2 parents bb7fdb0 + e74ac2e commit e2e932c
Show file tree
Hide file tree
Showing 3 changed files with 4 additions and 4 deletions.
2 changes: 1 addition & 1 deletion w05_function_approx/t02_vfa_grad.tex
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@
\begin{itemize}
\item Represent a value function (or state-action value function) for a
particular policy with a weighted linear combination of features
$$ \hat{V}(s; \vec{w}) = \sum_{j=1}^n \vec{x}_j (s) \vec{w}_j = \vec{x}(s)^T\vec{w}$$
$$ \hat{V}^\pi(s; \vec{w}) = \sum_{j=1}^n \vec{x}_j (s) \vec{w}_j = \vec{x}(s)^T\vec{w}$$
\item Objective function is
$$ J(\vec{w}) = \mathbb{E}[(V^\pi(s) - \hat{V}^\pi(s; \vec{w}))^2]$$
\item Recall weight update:
Expand Down
4 changes: 2 additions & 2 deletions w05_function_approx/t03_vfa_mc_td.tex
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@
\item Concretely when using linear VFA for policy evaluation

\begin{eqnarray}
\Delta \vec{w} &=& \alpha (G_t - \hat{V} (s_t, \vec{w})) \nabla_\vec{w}\hat{V}(s_t; \vec{w}) \nonumber\\
&=& \alpha (G_t - \hat{V} (s_t, \vec{w})) \vec{x}(s_t) \nonumber\\
\Delta \vec{w} &=& \alpha (G_t - \hat{V} (s_t; \vec{w})) \nabla_\vec{w}\hat{V}(s_t; \vec{w}) \nonumber\\
&=& \alpha (G_t - \hat{V} (s_t; \vec{w})) \vec{x}(s_t) \nonumber\\
&=& \alpha (G_t - \vec{x}(s_t)^T \vec{w}) \vec{x}(s_t) \nonumber
\end{eqnarray}

Expand Down
2 changes: 1 addition & 1 deletion w07_policy_search/t06_pg_algorithms.tex
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
$$\nabla_\theta V(\theta) = \frac{1}{m} \sum_{i=1}^{m} R(\tau^{(i)}) \sum_{t=0}^{T-1} \nabla_\theta \log \pi_\theta (a_t^{(i)} \mid s_t^{(i)}) $$

\begin{itemize}
\item Unbiased but very noise
\item Unbiased but very noisy
\item Fixes that can make it practical
\begin{itemize}
\item Temporal structure
Expand Down

0 comments on commit e2e932c

Please sign in to comment.