- Pin sub-packages to pre-refactor versions
- Agent calls now accept keyword arguments that will be passed to the policy. E.g. if the policy accepts a testmode.
- Transition to
RLCore.forward
,RLBase.act!
,RLBase.plan!
andBase.push!
syntax instead of functional objects for hooks, policies and environments
- Drop
ReinforcementLearning.jl
from dependencies, useReinforcementLearningCore.jl
instead
- Support
device_rng
in SAC #606
- Test experiments on GPU by default #549
- Added an experiment for DQN training on discrete
PendulumEnv
(#537)
- Transition to
RLCore.forward
,RLBase.act!
,RLBase.plan!
andBase.push!
syntax instead of functional objects for hooks, policies and environments
- Reduce allocations, improve performance of
RandomWalk1D
- Add tests to
RandomWalk1D
- Chase down JET.jl errors, fix
- Update
TicTacToeEnv
andRockPaperScissorsEnv
to support newMultiAgentPolicy
setup
- Bugfix bug with
is_discrete_space
#566
- Bugfix of CartPoleEnv with keyword arguments
- Bugfix of CartPoleEnv with Float32
- Added a continuous option for CartPoleEnv #543.
- Support
action_space(::TicTacToeEnv, player)
.
- Fixed bugs in plotting
MountainCarEnv
(#537) - Implemented plotting for
PendulumEnv
(#537)
- Bugfix with
ZeroTo
#534
- Add
GraphShortestPathEnv
. #445
- Add
StockTradingEnv
from the paper Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy. This environment is a good testbed for multi-continuous action space algorithms. #428
- Add
SequentialEnv
environment wrapper to turn a simultaneous environment into a sequential one.
- Set
AcrobotEnv
into lazy loading to reduce the dependency ofOrdinaryDiffEq
.
- Transition to
RLCore.forward
,RLBase.act!
,RLBase.plan!
andBase.push!
syntax instead of functional objects for hooks, policies and environments - Reduce excess
TDLearner
allocations by using Tuple instead of Array
- Make keyword argument
n_actions
inTabularPolicy
optional. #300
- Extensive refactor based on RLBase.jl
v0.11
, most components not yet ported
- Fix multi-dimension action space in TD3. #624
- Support
device_rng
in SAC #606
- Fix warning about
vararg.data
in [email protected] #560
- Make BC GPU compatible #553
- Make most algorithms GPU compatible #549
- Support
length
method forVectorWSARTTrajectory
.
- Revert part of the unexpected change of PPO in the last PR.
- Fixed the bug with MaskedPPOTrajectory reported here
- Update the complete SAC implementation and modify some details based on the original paper. #365
- Add some extra keyword parameters for
BehaviorCloningPolicy
to use it online. #390
- Moved all the experiments into a new package
ReinforcementLearningExperiments.jl
. The related dependencies are also removed (BSON.jl
,StableRNGs.jl
,TensorBoardLogger.jl
).
- Add functionality for fetching d4rl datasets as an iterable DataSet. Credits: https://arxiv.org/abs/2004.07219
- This supports d4rl and d4rl-pybullet and Google Research DQN atari datasets.
- Uses DataDeps for data dependency management.
- This package also supports RL Unplugged Datasets.
- Support for google-research/deep_ope added.
- Transition to
RLCore.forward
,RLBase.act!
,RLBase.plan!
andBase.push!
syntax instead of functional objects for hooks, policies and environments
- Update POMDPModelTools -> POMDPTools
- Add
next_player!
method to supportSequential
MultiAgent
environments
- Implement
Base.:(==)
forSpace
. #428
- Add default
Base.:(==)
andBase.hash
method forAbstractEnv
. #348
- Fix hook issue with 'extra' call; always run
push!
at end of episode, regardless of whether stopped or terminated
- Transition to
RLCore.forward
,RLBase.act!
,RLBase.plan!
andBase.push!
syntax instead of functional objects for hooks, policies and environments
- Add back multi-agent support with
MultiAgentPolicy
andMultiAgentHook
- Use correct Flux.stack function signature
- Reduce allocations, improve performance of
RandomPolicy
- Chase down JET.jl errors, fix
- Add tests for
StopAfterStep
,StopAfterEpisode
- Add tests, improve performance of
RewardsPerEpisode
- Refactor
Agent
for speedup
- When sending a
CircularArrayBuffer
to GPU devices, convertCircularArrayBuffer
intoCuArray
instead of the adaptedCircularArrayBuffer
ofCuArray
. #606
- Fix warning about
vararg.data
in [email protected] #560
- Make
GaussianNetwork
differentiable. #549
- Fixed a bug [1] with the
DoOnExit
hook (#537) - Added some convenience hooks for rendering rollout episodes (#537)
- Fixed the method overwritten warning of
device
fromCUDA.jl
.
- Add extra two optional keyword arguments (
min_σ
andmax_σ
) inGaussianNetwork
to clip the output oflogσ
. #428
- Add GaussianNetwork and DuelingNetwork into ReinforcementLearningCore.jl as general components. #370
- Export
WeightedSoftmaxExplorer
. #382
- Minor bug & typo fixes
- Removed
ResizeImage
preprocessor to reduce the dependency ofImageTransformations
. - Show unicode plot at the end of an experiment in the
TotalRewardPerEpisode
hook.