diff --git a/docs/build/doctrees/environment.pickle b/docs/build/doctrees/environment.pickle index bdf9754..afc1e24 100644 Binary files a/docs/build/doctrees/environment.pickle and b/docs/build/doctrees/environment.pickle differ diff --git a/docs/build/doctrees/index.doctree b/docs/build/doctrees/index.doctree index c7cb16c..dba0f4c 100644 Binary files a/docs/build/doctrees/index.doctree and b/docs/build/doctrees/index.doctree differ diff --git a/docs/build/doctrees/rst/api.doctree b/docs/build/doctrees/rst/api.doctree index b069046..608833c 100644 Binary files a/docs/build/doctrees/rst/api.doctree and b/docs/build/doctrees/rst/api.doctree differ diff --git a/docs/build/doctrees/rst/blog.doctree b/docs/build/doctrees/rst/blog.doctree index fa0abdf..a65fe47 100644 Binary files a/docs/build/doctrees/rst/blog.doctree and b/docs/build/doctrees/rst/blog.doctree differ diff --git a/docs/build/doctrees/rst/landing.doctree b/docs/build/doctrees/rst/landing.doctree index b2ba35c..a53e495 100644 Binary files a/docs/build/doctrees/rst/landing.doctree and b/docs/build/doctrees/rst/landing.doctree differ diff --git a/docs/build/html/.buildinfo b/docs/build/html/.buildinfo index c6627a1..8037061 100644 --- a/docs/build/html/.buildinfo +++ b/docs/build/html/.buildinfo @@ -1,4 +1,4 @@ # Sphinx build info version 1 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. -config: 4bbe4e1bdc45b7d096f5e7a0a5eb5873 +config: 6be11b3893c1b9feee4dcb2d0620068b tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/docs/build/html/_images/0-5_blog_envpool.png b/docs/build/html/_images/0-5_blog_envpool.png new file mode 100644 index 0000000..46a472e Binary files /dev/null and b/docs/build/html/_images/0-5_blog_envpool.png differ diff --git a/docs/build/html/_images/0-5_blog_header.png b/docs/build/html/_images/0-5_blog_header.png new file mode 100644 index 0000000..1c49024 Binary files /dev/null and b/docs/build/html/_images/0-5_blog_header.png differ diff --git a/docs/build/html/_sources/rst/api.rst.txt b/docs/build/html/_sources/rst/api.rst.txt index 505e364..58dc7e0 100644 --- a/docs/build/html/_sources/rst/api.rst.txt +++ b/docs/build/html/_sources/rst/api.rst.txt @@ -9,7 +9,7 @@ Emulation Wrap your environments for broad compatibility. Supports passing creator functions, classes, or env objects. The API of the returned PufferEnv is the same as Gym/PettingZoo. -.. autoclass:: pufferlib.emulation.GymPufferEnv +.. autoclass:: pufferlib.emulation.GymnasiumPufferEnv :members: :undoc-members: :noindex: @@ -19,93 +19,21 @@ Wrap your environments for broad compatibility. Supports passing creator functio :undoc-members: :noindex: -Registry -######## +Environments +############ -make_env functions and policies for included environments. +All included environments expose make_env and env_creator functions. make_env is the one that you want most of the time. The other one is used to expose e.g. class interfaces for environments that support them so that you can pass around static references. -Atari -***** - -.. automodule:: pufferlib.registry.atari - :members: - :undoc-members: - :noindex: +Additionally, all environments expose a Policy class with a baseline model. Note that not all environments have *custom* policies, and the default simply flattens observations before applying a linear layer. Atari, Procgen, Neural MMO, Nethack/Minihack, and Pokemon Red currently have reasonable policies. +The PufferLib Squared environment is used as an example below. Everything is exposed through __init__, so you can call these methods through e.g. pufferlib.environments.squared.make_env -Butterfly -********* - -.. automodule:: pufferlib.registry.butterfly +.. automodule:: pufferlib.environments.squared.environment :members: :undoc-members: :noindex: - -Classic Control -*************** - -.. automodule:: pufferlib.registry.classic_control - :members: - :undoc-members: - :noindex: - -Crafter -******* - -.. automodule:: pufferlib.registry.crafter - :members: - :undoc-members: - :noindex: - -Griddly -******* - -.. automodule:: pufferlib.registry.griddly - :members: - :undoc-members: - :noindex: - - -MAgent -****** - -.. automodule:: pufferlib.registry.magent - :members: - :undoc-members: - :noindex: - - -MicroRTS -******** - -.. automodule:: pufferlib.registry.microrts - :members: - :undoc-members: - :noindex: - - -NetHack -******* - -.. automodule:: pufferlib.registry.nethack - :members: - :undoc-members: - :noindex: - - -Neural MMO -********** - -.. automodule:: pufferlib.registry.nmmo - :members: - :undoc-members: - :noindex: - -Procgen -******* - -.. automodule:: pufferlib.registry.procgen +.. autoclass:: pufferlib.environments.squared.torch.Policy :members: :undoc-members: :noindex: @@ -113,7 +41,7 @@ Procgen Models ###### -PufferLib model API and default policies +PufferLib model default policies and optional API. These are not required to use PufferLib. .. automodule:: pufferlib.models :members: @@ -150,7 +78,7 @@ Wrap your PyTorch policies for use with CleanRL :undoc-members: :noindex: -Recurrence requires you to subclass our base policy instead. See the default policies for examples. +Wrap your PyTorch policies for use with CleanRL but add an LSTM. This requires you to use our policy API. It's pretty simple -- see the default policies for examples. .. autoclass:: pufferlib.frameworks.cleanrl.RecurrentPolicy :members: @@ -160,9 +88,14 @@ Recurrence requires you to subclass our base policy instead. See the default pol RLlib Binding ############# -Wrap your policies for use with RLlib (WIP) +Wrap your policies for use with RLlib (Shelved until RLlib is more stable) .. automodule:: pufferlib.frameworks.rllib :members: :undoc-members: - :noindex: \ No newline at end of file + :noindex: + +SB3 Binding +########### + +Coming soon! diff --git a/docs/build/html/_sources/rst/blog.rst.txt b/docs/build/html/_sources/rst/blog.rst.txt index 86b1d2a..9b33d8b 100644 --- a/docs/build/html/_sources/rst/blog.rst.txt +++ b/docs/build/html/_sources/rst/blog.rst.txt @@ -11,6 +11,73 @@ +PufferLib 0.5: A Bigger EnvPool for Growing Puffers +################################################### + +This is what reinforcement learning does to your CPU utilization. + +.. figure:: ../_static/0-5_blog_header.png + +You wouldn’t pack a box this way, right? With PufferLib 0.5, we are releasing a Python implementation of EnvPool to solve this problem. **TL;DR: ~20% performance improvement across most workloads, up to 2x for complex environments, and native multiagent support.** + +.. figure:: ../_static/0-5_blog_envpool.png + +If you just want the enhancements, you can pip install -U pufferlib. But if you’d like to see a bit behind the curtain, read on! + +The Simulation Crisis +********************* + +You want to do some RL research, so you install Atari. Say it runs at 1000 steps/second on 1 core and 5000 steps/second on 6 cores. Now, you decide you want to work on a more interesting environment and happen upon Neural MMO, a brilliant project that must have been developed by a truly fantastic team. It runs at 1500 steps/second – faster than Atari! So you scale it up to 6 cores and it runs at … 1800 steps per second. What gives? + +The problem is that environments simulated on different cores do not run at the same speed. Even if they did, many modern CPUs have cores that run at different speeds. Parallelization overhead is mostly the sum of: +- Launching/synchronization overhead. This is roughly 0.1 ms per process and is linear in the number of processes. At ~100 steps per second, you can ignore it. At >10,000 steps/second, it is the main limiting factor. +- Environment variance. This is defined by the ratio mu/std of the environment simulation time and scales with the square root of the number of processes. For 24 processes, 10% std is 20% overhead and 100% std is 300% overhead. +- Different core speeds. Many modern CPUs, especially Intel desktop series processors, feature additional cores that are ~20% slower than the main cores. +- Model latency. This is the time taken to transfer observations to GPU, run the model, and transfer actions to CPU. It is not technically part of multiprocesssing overhead, but naive implementations will leave CPUs idle during model inference. + +As a rule of thumb, simple RL environments have < 10% variance because the code is always simulating roughly the same thing. Complex environments, especially ones with variable numbers of agents, can have > 100% variance because different code runs depending on the current state. On the other hand, if your environment has 100 agents, you are effectively running 100x fewer simulations for the same data, so launching/synchronization overhead is lower. + +The Solution +************ + +Run multiple environments per process if you have > ~2000 sps environment with variance < ~10%. This will reduce the impact of launching/synchronization overhead and also reduces variance because you are summing over samples. In PufferLib, we typically enable this only for environments > ~5000 sps because of interactions with the optimizations below. + +Simulate multiple buffers of environments so that one buffer is running while your model is processing observations from the other. This technique was introduced by https://github.com/alex-petrenko/sample-factory and does not speed up simulation, but it allows you to interleave simulations from two sets of environments. It’s a good trick, but it is superseded by the final optimization, which is faster and simpler. + +Run a pool of environments and sample from the first ones to finish stepping. For example, if you want a batch of 24 observations, you might run 64 environments. At each step, the 24 for which you have computed actions are going to take a while to simulate, but you can still select the fastest 24 from the other 64-24=40 environments. This technique was introduced by https://github.com/sail-sg/envpool and is massively effective, but the original implementation is only for specific C/C++ environments. PufferLib’s implementation is in Python, so it is slower, but it works for arbitrary Python environments and includes native multiagent support. + +Experiments +*********** + +To evaluate the performance of different backends, I am using a 13900k (24 cores) on a max specced Maingear desktop running a minimal Debian 12 install. We test 9 different simulated environments: 1e-2 to 1-4 mean delay with 0-100% delay std. For each environment, we spawn 1, 6, 24, 96, and 192 processes for each backend tested (Gymnasium’s and Pufferlib’s serial and multiprocessing implementations + Pufferlib’s pool). We also have Ray implementations compatible with our pooling code, but that will be a separate post. Additionally, PufferLib implementations sweep over (1, 2, 4) environments per process and PufferLib pool will compute 24 observations at a time. We do not consider model latency, which can yield another 2x relative performance for pooling on specific workloads. + +.. figure:: ../_static/0-5_blog_envpool.png + +9 groups of bars, each for one environment. 5 groups of bars per environment, each for a specific number of processes. The serial Gymasium/PufferLib experiments match in all cases. The best PufferLib settings are 10-20% faster than the best Gymasium settings for all workloads and can be up to 2x faster for environments with a high standard deviation in important cases (for instance, you may not want to run 192 copies of heavy environments). Again, this is before even considering the time saved by interleaving with the model forward pass. + +All of the implementations start to dip ~10% at 1,000 steps/second and ~50% at 10,000 steps/second. To make absolutely sure that this overhead is unavoidable, I reimplemented the entire pool architecture as minimally as possible, without any of the environment wrapper or data transfer overhead: + +SPS: 10734.36 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 1 batch_size: 1 sync: False +SPS: 11640.42 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 1 batch_size: 1 sync: True +SPS: 32715.65 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 6 batch_size: 6 sync: False +SPS: 27635.31 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 6 batch_size: 6 sync: True +SPS: 22681.48 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 24 batch_size: 6 sync: False +SPS: 26183.73 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 24 batch_size: 24 sync: False +SPS: 30120.75 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 24 batch_size: 6 sync: True + +As it turns out, Python’s multiprocessing caps around 10,000 steps per second per worker. There is still room for improvement by running more environments per process, but at this speed, small optimizations to the data processing code start to matter much more. + +Technical Details and Gotchas +**************************** + +PufferLib’s vectorization library is extremely concise – around 800 lines for serial, multiprocessing, and ray backends with support for PufferLib’s Gymnasium and PettingZoo wrappers. Adding envpool only required changing around 100 lines of code but required a lot of performance testing: +Don’t use multiprocessing.Queue. There’s no fast way to poll which processes are done. Instead, use multiprocessing.Pipe and poll with selectors. I have not seen noticeable overhead from this in any of my tests. +Don’t use time.sleep(), as this will trigger context switching, or time.time(), as this will include time spent on other processes. Use time.process_time() if you want an equal slice per core or count to ~150M/second (time it on your machine) if you want a fixed amount of work. + +The ray backend was extremely easy to implement thanks to ray.wait(). It is unfortunately too slow for most environments, but I wish standard multiprocessing used the Ray API, if not the architecture. The library itself has some cleanup issues that can cause crashes during heavy performance tests, which is why results are not included in this post. + +There’s one other thing I want to mention for people looking at the code. I was doing some experimental procedural stuff testing different programming paradigms, so the actual class interfaces are in __init__. It’s pretty much equivalent to one subclass per backend. + PufferLib 0.4: Ready to Take on Bigger Fish ########################################### diff --git a/docs/build/html/_sources/rst/landing.rst.txt b/docs/build/html/_sources/rst/landing.rst.txt index 4b4661f..d4d72ab 100644 --- a/docs/build/html/_sources/rst/landing.rst.txt +++ b/docs/build/html/_sources/rst/landing.rst.txt @@ -44,7 +44,7 @@ You have an environment, a PyTorch model, and a reinforcement learning library t | -Join our community Discord for support and Discussion, follow my Twitter for news, and star the repo to feed the puffer. :download:`Whitepaper <../_static/neurips_2023_aloe.pdf>` appearing at NeurIPS 2023 ALOE Workshop. Come say hi! +Join our community Discord for support and Discussion, follow my Twitter for news, and star the repo to feed the puffer. We also have a :download:`Whitepaper <../_static/neurips_2023_aloe.pdf>` featured at the NeurIPS 2023 ALOE workshop. .. dropdown:: Installation @@ -54,7 +54,7 @@ Join our community Discord for support and Discussion, follow my Twitter for new `PufferTank `_ is a GPU container with PufferLib and dependencies for all environments in the registry, including some that are slow and tricky to install. - If you are new to containers, clone the repository and open it in VSCode. You will need to install the Dev Container plugin as well as Docker Desktop. VSCode will then detect the settings in .devcontainer and set up the container for you. + If you have not used containers before and just want everything to work, clone the repository and open it in VSCode. You will need to install the Dev Container plugin as well as Docker Desktop. VSCode will then detect the settings in .devcontainer and set up the container for you. .. tab-item:: Pip @@ -76,7 +76,7 @@ Join our community Discord for support and Discussion, follow my Twitter for new **Joseph Suarez**: Creator and developer of PufferLib - **David Bloomin**: Policy pool/store/selector + **David Bloomin**: 0.4 policy pool/store/selector **Nick Jenkins**: Layout for the system architecture diagram. Adversary.design. @@ -86,40 +86,45 @@ Join our community Discord for support and Discussion, follow my Twitter for new **You can open this guide in a Colab notebook by clicking the demo button at the top of this page** -Complex environments may have heirarchical observations and actions, variable numbers of agents, and other quirks that make them difficult to work with and incompatible with standard reinforcement learning libraries. PufferLib's emulation layer makes every environment look like it has flat observations and actions and a constant number of agents, with no changes to the underlying environment. Here's how it works with two notoriously complex environments, NetHack and Neural MMO. +Complex environments may have heirarchical observations and actions, variable numbers of agents, and other quirks that make them difficult to work with and incompatible with standard reinforcement learning libraries. PufferLib's emulation layer makes every environment look like it has flat observations/actions and a constant number of agents. Here's how it works with NetHack and Neural MMO, two notoriously complex environments. .. code-block:: python import pufferlib.emulation + import pufferlib.wrappers import nle, nmmo def nmmo_creator(): - return pufferlib.emulation.PettingZooPufferEnv(env_creator=nmmo.Env) + env = nmmo.Env() + env = pufferlib.wrappers.PettingZooTruncatedWrapper(env) + return pufferlib.emulation.PettingZooPufferEnv(env=env) def nethack_creator(): - return pufferlib.emulation.GymPufferEnv(env_creator=nle.env.NLE) + return pufferlib.emulation.GymnasiumPufferEnv(env_creator=nle.env.NLE) -You can pass envs by class, creator function, or object, with or without additional arguments. These wrappers enable us to make some optimizations to vectorization code that would be difficult to implement otherwise. You can choose from a variety of vectorization backends. They all share the same interface with synchronous and asynchronous options. +The wrappers give you back a Gymnasium/PettingZoo compliant environment. There is no loss of generality and no change to the underlying environment. You can wrap environments by class, creator function, or object, with or without additional arguments. These wrappers enable us to make some optimizations to vectorization code that would be difficult to implement otherwise. You can choose from a variety of vectorization backends. They all share the same interface with synchronous and asynchronous options. .. code-block:: python import pufferlib.vectorization - # vec = pufferlib.vectorization.Serial - vec = pufferlib.vectorization.Multiprocessing + vec = pufferlib.vectorization.Serial + # vec = pufferlib.vectorization.Multiprocessing # vec = pufferlib.vectorization.Ray - envs = vec(nmmo_creator, num_workers=2, envs_per_worker=2) + # Vectorization API. Specify total number of environments and number per worker + # Setting env_pool=True can be much faster but requires some tweaks to learning code + envs = vec(nmmo_creator, num_envs=4, envs_per_worker=2, env_pool=False) - sync = True - if sync: - obs = envs.reset() - else: - envs.async_reset() - obs, _, _, _ = envs.recv() + # Synchronous API - reset/step + # obs = envs.reset()[0] -We suggest Serial for debugging and Multiprocessing for most training runs. Ray is a good option if you need to scale beyond a single machine. + # Asynchronous API - async_reset, send/recv + envs.async_reset() + obs = envs.recv()[0] + +Our backends support asynchronous on-policy sampling through a Python implementation of EnvPool. This makes them *faster* than the implementations that ship with most RL libraries. We suggest Serial for debugging and Multiprocessing for most training runs. Ray is a good option if you need to scale beyond a single machine. PufferLib allows you to write vanilla PyTorch policies and use them with multiple learning libraries. We take care of the details of converting between the different APIs. Here's a policy that will work with *any* environment, with a one-line wrapper for CleanRL. @@ -132,7 +137,7 @@ PufferLib allows you to write vanilla PyTorch policies and use them with multipl import pufferlib.frameworks.cleanrl class Policy(nn.Module): - def __init__(self, envs): + def __init__(self, env): super().__init__() self.encoder = nn.Linear(np.prod( envs.single_observation_space.shape), 128) @@ -151,12 +156,10 @@ PufferLib allows you to write vanilla PyTorch policies and use them with multipl policy = Policy(envs.driver_env) cleanrl_policy = pufferlib.frameworks.cleanrl.Policy(policy) actions = cleanrl_policy.get_action_and_value(obs)[0].numpy() - obs, rewards, dones, infos = envs.step(actions) + obs, rewards, terminals, truncateds, infos, env_id, mask = envs.step(actions) envs.close() -There's also a lightweight, fully optional base policy class for PufferLib. It breaks the forward pass into two functions, encode_observations and decode_actions. The advantage of this is that it lets us handle recurrance for you, since every framework does this a bit differently. - -So far, the code above is fully general and does not rely on PufferLib support for specific environments. For convenience, we also provide a registry of environments and models. Here's a complete example. +There's also an optional policy base class for PufferLib. It just breaks the forward pass into an encode and decode step, which allows us to handle recurrance for you. So far, the code above is fully general and does not rely on PufferLib support for specific environments. For convenience, we also provide environment hooks with standard wrappers and baseline models. Here's a complete example. .. code-block:: python @@ -165,33 +168,32 @@ So far, the code above is fully general and does not rely on PufferLib support f import pufferlib.models import pufferlib.vectorization import pufferlib.frameworks.cleanrl - import pufferlib.registry.nmmo + import pufferlib.environments.nmmo envs = pufferlib.vectorization.Multiprocessing( - env_creator=pufferlib.registry.nmmo.make_env, - num_workers=2, envs_per_worker=2) + env_creator=pufferlib.environments.nmmo.make_env, + num_envs=4, envs_per_worker=2) - policy = pufferlib.registry.nmmo.Policy(envs.driver_env) - policy = pufferlib.models.RecurrentWrapper(envs, policy, - input_size=256, hidden_size=256) - cleanrl_policy = pufferlib.frameworks.cleanrl.RecurrentPolicy(policy) + policy = pufferlib.environments.nmmo.Policy(envs.driver_env) + cleanrl_policy = pufferlib.frameworks.cleanrl.Policy(policy) - obs = envs.reset() - obs = torch.Tensor(obs) - state = [torch.zeros((1, 256, 256)), torch.zeros((1, 256, 256))] - actions = cleanrl_policy.get_action_and_value(obs, state)[0].numpy() - obs, rewards, dones, infos = envs.step(actions) + env_outputs = envs.reset()[0] + obs = torch.Tensor(env_outputs) + actions = cleanrl_policy.get_action_and_value(obs)[0].numpy() + obs, rewards, terminals, truncateds, infos, env_id, mask = envs.step(actions) envs.close() -It's that simple -- almost. If you have an environment with structured observations, you'll hvae to unpack them in the network forward pass since PufferLif will flatten them in emulation. We provide a utility for this -- just be sure to save a reference to your environment inside of the model so you have access to the observation space. +It's that simple -- almost. If you have an environment with structured observations, you'll have to unpack them in the network forward pass since PufferLib will flatten them in emulation. We provide a utility for this. .. code-block:: python - env_outputs = pufferlib.emulation.unpack_batched_obs( - env_outputs, self.envs.flat_observation_space + obs = pufferlib.emulation.unpack_batched_obs( + env_outputs, + envs.driver_env.flat_observation_space, + envs.driver_env.flat_observation_structure ) -That's all you need to get started. The PufferLib repository contains full-length CleanRL scripts with PufferLib integration. +That's all you need to get started. The PufferLib repository contains full-length CleanRL scripts with PufferLib integration. SB3 and other integrations coming soon! Libraries ######### @@ -223,7 +225,7 @@ PufferLib provides *pufferlib.frameworks* for the the learning libraries below. Or view it on GitHub `here `_ -We are also working on a heavily customized version of CleanRL PPO with support for recurrent and non-recurrent models, async environment execution, variable agent populations, self-play, and experiment management. This is the version we use for our research and the NeurIPS 2023 Neural MMO Competition. It's still under development, but you can try it out `here `_ +PufferLib also includes a heavily customized version of CleanRL PPO with support for recurrent and non-recurrent models, async environment execution, variable agent populations, self-play, and experiment management. This is the version we use for our research and the NeurIPS 2023 Neural MMO Competition. You can try it out `here `_ .. raw:: html @@ -238,12 +240,12 @@ We are also working on a heavily customized version of CleanRL PPO with support -While RLlib is great on paper, there are currently a few issues. The pre-gymnasium 2.0 release is very buggy and has next to no error checking on the user API. The latest version may be more stable, but it pins a very recent version of Gymnasium that breaks compatiblity with many environments. We have a simple running script `here `_ that works with 2.0 for now. We will update this when the situation improves. +We have previously supported RLLib and may again in the future. RLlib has not received updates in a while, and the current release is very buggy. We will update this if the situation improves. Environments ############ -We also provide a registry of environments and models that are supported out of the box. These environments are already set up for you in PufferTank and are used in our test cases to ensure they work with PufferLib. Several also include reasonable baseline policies. Join our Discord if you would like to add setup and tests for new environments or improvements to any of the baselines. +We also provide integrations for many environments out of the box. Non-pip dependencies are already set up for you in PufferTank. Several environments also include reasonable baseline policies. Join our Discord if you would like to add setup and tests for new environments or improvements to any of the baselines. .. raw:: html @@ -261,12 +263,12 @@ We also provide a registry of environments and models that are supported out of
-

Arcade Learning Environment provides a Gym interface for classic Atari games. This is the most popular benchmark for reinforcement learning algorithms.

+

Pokemon Red is one of the original Pokemon games for gameboy. This project uses the game as an environment for reinforcement learning. We are actively supporting development on this one!

@@ -283,12 +285,23 @@ We also provide a registry of environments and models that are supported out of
-

Neural MMO is a massively multiagent environment for reinforcement learning. It combines large agent populations with high per-agent complexity and is the most actively maintained (by me) project on this list.

+

Arcade Learning Environment provides a Gym interface for classic Atari games. This is the most popular benchmark for reinforcement learning algorithms.

+
+
+ +
+
+ + Star Minigrid + +
+
+

Minigrid is a 2D grid-world environment engine and a collection of builtin environments. The target is flexible and computationally efficient RL research.

@@ -303,6 +316,17 @@ We also provide a registry of environments and models that are supported out of +
+
+ + Star Neural MMO + +
+
+

Neural MMO is a massively multiagent environment for reinforcement learning. It combines large agent populations with high per-agent complexity and is the most actively maintained (by me) project on this list.

+
+
+ +
+ +
+

MiniHack Learning Environment is a stripped down version of NetHack with support for level editing and custom procedural generation.

+
+
+
@@ -362,11 +397,9 @@ Current Limitations ################### - No continuous action spaces (WIP) -- Pre-gymnasium Gym and PettingZoo only (WIP) - Support for heterogenous observations and actions requires you to specify teams such that each team has the same observation and action space. There's no good way around this. License ####### -PufferLib is free and open-source software under the MIT license. This is the full set of tools maintained by PufferAI; we do not have private repositories with additional utilities. - +PufferLib is free and open-source software under the MIT license. This is the full set of tools maintained by PufferAI. Dev branches are public and we do not have private repositories with additional utilities. diff --git a/docs/build/html/_static/0-5_blog_envpool.png b/docs/build/html/_static/0-5_blog_envpool.png new file mode 100644 index 0000000..46a472e Binary files /dev/null and b/docs/build/html/_static/0-5_blog_envpool.png differ diff --git a/docs/build/html/_static/0-5_blog_header.png b/docs/build/html/_static/0-5_blog_header.png new file mode 100644 index 0000000..1c49024 Binary files /dev/null and b/docs/build/html/_static/0-5_blog_header.png differ diff --git a/docs/build/html/_static/documentation_options.js b/docs/build/html/_static/documentation_options.js index 7696827..0ce3c19 100644 --- a/docs/build/html/_static/documentation_options.js +++ b/docs/build/html/_static/documentation_options.js @@ -1,6 +1,6 @@ var DOCUMENTATION_OPTIONS = { URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'), - VERSION: '0.4.3', + VERSION: '0.5.0', LANGUAGE: 'en', COLLAPSE_INDEX: false, BUILDER: 'html', diff --git a/docs/build/html/_static/pygments.css b/docs/build/html/_static/pygments.css index 9c45769..5c8cad8 100644 --- a/docs/build/html/_static/pygments.css +++ b/docs/build/html/_static/pygments.css @@ -22,6 +22,7 @@ .highlight .cs { color: #8f5902; font-style: italic } /* Comment.Special */ .highlight .gd { color: #a40000 } /* Generic.Deleted */ .highlight .ge { color: #000000; font-style: italic } /* Generic.Emph */ +.highlight .ges { color: #000000; font-weight: bold; font-style: italic } /* Generic.EmphStrong */ .highlight .gr { color: #ef2929 } /* Generic.Error */ .highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */ .highlight .gi { color: #00A000 } /* Generic.Inserted */ @@ -89,35 +90,36 @@ body[data-theme="dark"] .highlight td.linenos .special { color: #000000; backgro body[data-theme="dark"] .highlight span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } body[data-theme="dark"] .highlight .hll { background-color: #49483e } body[data-theme="dark"] .highlight { background: #272822; color: #f8f8f2 } -body[data-theme="dark"] .highlight .c { color: #75715e } /* Comment */ -body[data-theme="dark"] .highlight .err { color: #960050; background-color: #1e0010 } /* Error */ +body[data-theme="dark"] .highlight .c { color: #959077 } /* Comment */ +body[data-theme="dark"] .highlight .err { color: #ed007e; background-color: #1e0010 } /* Error */ body[data-theme="dark"] .highlight .esc { color: #f8f8f2 } /* Escape */ body[data-theme="dark"] .highlight .g { color: #f8f8f2 } /* Generic */ body[data-theme="dark"] .highlight .k { color: #66d9ef } /* Keyword */ body[data-theme="dark"] .highlight .l { color: #ae81ff } /* Literal */ body[data-theme="dark"] .highlight .n { color: #f8f8f2 } /* Name */ -body[data-theme="dark"] .highlight .o { color: #f92672 } /* Operator */ +body[data-theme="dark"] .highlight .o { color: #ff4689 } /* Operator */ body[data-theme="dark"] .highlight .x { color: #f8f8f2 } /* Other */ body[data-theme="dark"] .highlight .p { color: #f8f8f2 } /* Punctuation */ -body[data-theme="dark"] .highlight .ch { color: #75715e } /* Comment.Hashbang */ -body[data-theme="dark"] .highlight .cm { color: #75715e } /* Comment.Multiline */ -body[data-theme="dark"] .highlight .cp { color: #75715e } /* Comment.Preproc */ -body[data-theme="dark"] .highlight .cpf { color: #75715e } /* Comment.PreprocFile */ -body[data-theme="dark"] .highlight .c1 { color: #75715e } /* Comment.Single */ -body[data-theme="dark"] .highlight .cs { color: #75715e } /* Comment.Special */ -body[data-theme="dark"] .highlight .gd { color: #f92672 } /* Generic.Deleted */ +body[data-theme="dark"] .highlight .ch { color: #959077 } /* Comment.Hashbang */ +body[data-theme="dark"] .highlight .cm { color: #959077 } /* Comment.Multiline */ +body[data-theme="dark"] .highlight .cp { color: #959077 } /* Comment.Preproc */ +body[data-theme="dark"] .highlight .cpf { color: #959077 } /* Comment.PreprocFile */ +body[data-theme="dark"] .highlight .c1 { color: #959077 } /* Comment.Single */ +body[data-theme="dark"] .highlight .cs { color: #959077 } /* Comment.Special */ +body[data-theme="dark"] .highlight .gd { color: #ff4689 } /* Generic.Deleted */ body[data-theme="dark"] .highlight .ge { color: #f8f8f2; font-style: italic } /* Generic.Emph */ +body[data-theme="dark"] .highlight .ges { color: #f8f8f2; font-weight: bold; font-style: italic } /* Generic.EmphStrong */ body[data-theme="dark"] .highlight .gr { color: #f8f8f2 } /* Generic.Error */ body[data-theme="dark"] .highlight .gh { color: #f8f8f2 } /* Generic.Heading */ body[data-theme="dark"] .highlight .gi { color: #a6e22e } /* Generic.Inserted */ body[data-theme="dark"] .highlight .go { color: #66d9ef } /* Generic.Output */ -body[data-theme="dark"] .highlight .gp { color: #f92672; font-weight: bold } /* Generic.Prompt */ +body[data-theme="dark"] .highlight .gp { color: #ff4689; font-weight: bold } /* Generic.Prompt */ body[data-theme="dark"] .highlight .gs { color: #f8f8f2; font-weight: bold } /* Generic.Strong */ -body[data-theme="dark"] .highlight .gu { color: #75715e } /* Generic.Subheading */ +body[data-theme="dark"] .highlight .gu { color: #959077 } /* Generic.Subheading */ body[data-theme="dark"] .highlight .gt { color: #f8f8f2 } /* Generic.Traceback */ body[data-theme="dark"] .highlight .kc { color: #66d9ef } /* Keyword.Constant */ body[data-theme="dark"] .highlight .kd { color: #66d9ef } /* Keyword.Declaration */ -body[data-theme="dark"] .highlight .kn { color: #f92672 } /* Keyword.Namespace */ +body[data-theme="dark"] .highlight .kn { color: #ff4689 } /* Keyword.Namespace */ body[data-theme="dark"] .highlight .kp { color: #66d9ef } /* Keyword.Pseudo */ body[data-theme="dark"] .highlight .kr { color: #66d9ef } /* Keyword.Reserved */ body[data-theme="dark"] .highlight .kt { color: #66d9ef } /* Keyword.Type */ @@ -136,9 +138,9 @@ body[data-theme="dark"] .highlight .nl { color: #f8f8f2 } /* Name.Label */ body[data-theme="dark"] .highlight .nn { color: #f8f8f2 } /* Name.Namespace */ body[data-theme="dark"] .highlight .nx { color: #a6e22e } /* Name.Other */ body[data-theme="dark"] .highlight .py { color: #f8f8f2 } /* Name.Property */ -body[data-theme="dark"] .highlight .nt { color: #f92672 } /* Name.Tag */ +body[data-theme="dark"] .highlight .nt { color: #ff4689 } /* Name.Tag */ body[data-theme="dark"] .highlight .nv { color: #f8f8f2 } /* Name.Variable */ -body[data-theme="dark"] .highlight .ow { color: #f92672 } /* Operator.Word */ +body[data-theme="dark"] .highlight .ow { color: #ff4689 } /* Operator.Word */ body[data-theme="dark"] .highlight .pm { color: #f8f8f2 } /* Punctuation.Marker */ body[data-theme="dark"] .highlight .w { color: #f8f8f2 } /* Text.Whitespace */ body[data-theme="dark"] .highlight .mb { color: #ae81ff } /* Literal.Number.Bin */ @@ -174,35 +176,36 @@ body:not([data-theme="light"]) .highlight td.linenos .special { color: #000000; body:not([data-theme="light"]) .highlight span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } body:not([data-theme="light"]) .highlight .hll { background-color: #49483e } body:not([data-theme="light"]) .highlight { background: #272822; color: #f8f8f2 } -body:not([data-theme="light"]) .highlight .c { color: #75715e } /* Comment */ -body:not([data-theme="light"]) .highlight .err { color: #960050; background-color: #1e0010 } /* Error */ +body:not([data-theme="light"]) .highlight .c { color: #959077 } /* Comment */ +body:not([data-theme="light"]) .highlight .err { color: #ed007e; background-color: #1e0010 } /* Error */ body:not([data-theme="light"]) .highlight .esc { color: #f8f8f2 } /* Escape */ body:not([data-theme="light"]) .highlight .g { color: #f8f8f2 } /* Generic */ body:not([data-theme="light"]) .highlight .k { color: #66d9ef } /* Keyword */ body:not([data-theme="light"]) .highlight .l { color: #ae81ff } /* Literal */ body:not([data-theme="light"]) .highlight .n { color: #f8f8f2 } /* Name */ -body:not([data-theme="light"]) .highlight .o { color: #f92672 } /* Operator */ +body:not([data-theme="light"]) .highlight .o { color: #ff4689 } /* Operator */ body:not([data-theme="light"]) .highlight .x { color: #f8f8f2 } /* Other */ body:not([data-theme="light"]) .highlight .p { color: #f8f8f2 } /* Punctuation */ -body:not([data-theme="light"]) .highlight .ch { color: #75715e } /* Comment.Hashbang */ -body:not([data-theme="light"]) .highlight .cm { color: #75715e } /* Comment.Multiline */ -body:not([data-theme="light"]) .highlight .cp { color: #75715e } /* Comment.Preproc */ -body:not([data-theme="light"]) .highlight .cpf { color: #75715e } /* Comment.PreprocFile */ -body:not([data-theme="light"]) .highlight .c1 { color: #75715e } /* Comment.Single */ -body:not([data-theme="light"]) .highlight .cs { color: #75715e } /* Comment.Special */ -body:not([data-theme="light"]) .highlight .gd { color: #f92672 } /* Generic.Deleted */ +body:not([data-theme="light"]) .highlight .ch { color: #959077 } /* Comment.Hashbang */ +body:not([data-theme="light"]) .highlight .cm { color: #959077 } /* Comment.Multiline */ +body:not([data-theme="light"]) .highlight .cp { color: #959077 } /* Comment.Preproc */ +body:not([data-theme="light"]) .highlight .cpf { color: #959077 } /* Comment.PreprocFile */ +body:not([data-theme="light"]) .highlight .c1 { color: #959077 } /* Comment.Single */ +body:not([data-theme="light"]) .highlight .cs { color: #959077 } /* Comment.Special */ +body:not([data-theme="light"]) .highlight .gd { color: #ff4689 } /* Generic.Deleted */ body:not([data-theme="light"]) .highlight .ge { color: #f8f8f2; font-style: italic } /* Generic.Emph */ +body:not([data-theme="light"]) .highlight .ges { color: #f8f8f2; font-weight: bold; font-style: italic } /* Generic.EmphStrong */ body:not([data-theme="light"]) .highlight .gr { color: #f8f8f2 } /* Generic.Error */ body:not([data-theme="light"]) .highlight .gh { color: #f8f8f2 } /* Generic.Heading */ body:not([data-theme="light"]) .highlight .gi { color: #a6e22e } /* Generic.Inserted */ body:not([data-theme="light"]) .highlight .go { color: #66d9ef } /* Generic.Output */ -body:not([data-theme="light"]) .highlight .gp { color: #f92672; font-weight: bold } /* Generic.Prompt */ +body:not([data-theme="light"]) .highlight .gp { color: #ff4689; font-weight: bold } /* Generic.Prompt */ body:not([data-theme="light"]) .highlight .gs { color: #f8f8f2; font-weight: bold } /* Generic.Strong */ -body:not([data-theme="light"]) .highlight .gu { color: #75715e } /* Generic.Subheading */ +body:not([data-theme="light"]) .highlight .gu { color: #959077 } /* Generic.Subheading */ body:not([data-theme="light"]) .highlight .gt { color: #f8f8f2 } /* Generic.Traceback */ body:not([data-theme="light"]) .highlight .kc { color: #66d9ef } /* Keyword.Constant */ body:not([data-theme="light"]) .highlight .kd { color: #66d9ef } /* Keyword.Declaration */ -body:not([data-theme="light"]) .highlight .kn { color: #f92672 } /* Keyword.Namespace */ +body:not([data-theme="light"]) .highlight .kn { color: #ff4689 } /* Keyword.Namespace */ body:not([data-theme="light"]) .highlight .kp { color: #66d9ef } /* Keyword.Pseudo */ body:not([data-theme="light"]) .highlight .kr { color: #66d9ef } /* Keyword.Reserved */ body:not([data-theme="light"]) .highlight .kt { color: #66d9ef } /* Keyword.Type */ @@ -221,9 +224,9 @@ body:not([data-theme="light"]) .highlight .nl { color: #f8f8f2 } /* Name.Label * body:not([data-theme="light"]) .highlight .nn { color: #f8f8f2 } /* Name.Namespace */ body:not([data-theme="light"]) .highlight .nx { color: #a6e22e } /* Name.Other */ body:not([data-theme="light"]) .highlight .py { color: #f8f8f2 } /* Name.Property */ -body:not([data-theme="light"]) .highlight .nt { color: #f92672 } /* Name.Tag */ +body:not([data-theme="light"]) .highlight .nt { color: #ff4689 } /* Name.Tag */ body:not([data-theme="light"]) .highlight .nv { color: #f8f8f2 } /* Name.Variable */ -body:not([data-theme="light"]) .highlight .ow { color: #f92672 } /* Operator.Word */ +body:not([data-theme="light"]) .highlight .ow { color: #ff4689 } /* Operator.Word */ body:not([data-theme="light"]) .highlight .pm { color: #f8f8f2 } /* Punctuation.Marker */ body:not([data-theme="light"]) .highlight .w { color: #f8f8f2 } /* Text.Whitespace */ body:not([data-theme="light"]) .highlight .mb { color: #ae81ff } /* Literal.Number.Bin */ diff --git a/docs/build/html/genindex.html b/docs/build/html/genindex.html index e47cc15..006f471 100644 --- a/docs/build/html/genindex.html +++ b/docs/build/html/genindex.html @@ -4,7 +4,7 @@ - Index - PufferLib 0.4.3 documentation + Index - PufferLib 0.5.0 documentation @@ -188,7 +188,7 @@
@@ -211,7 +211,7 @@
@@ -213,7 +213,7 @@

Blog

    -
  • PufferLib 0.4: Ready to Take on Bigger Fish
      +
    • PufferLib 0.5: A Bigger EnvPool for Growing Puffers +
    • +
    • PufferLib 0.4: Ready to Take on Bigger Fish
      • Emulation
      • Vectorization
      • PufferTank
      • diff --git a/docs/build/html/objects.inv b/docs/build/html/objects.inv index 29bc024..168899d 100644 Binary files a/docs/build/html/objects.inv and b/docs/build/html/objects.inv differ diff --git a/docs/build/html/rst/_static/0-5_blog_envpool.png b/docs/build/html/rst/_static/0-5_blog_envpool.png new file mode 100644 index 0000000..46a472e Binary files /dev/null and b/docs/build/html/rst/_static/0-5_blog_envpool.png differ diff --git a/docs/build/html/rst/_static/0-5_blog_header.png b/docs/build/html/rst/_static/0-5_blog_header.png new file mode 100644 index 0000000..1c49024 Binary files /dev/null and b/docs/build/html/rst/_static/0-5_blog_header.png differ diff --git a/docs/build/html/rst/_static/documentation_options.js b/docs/build/html/rst/_static/documentation_options.js index 7696827..0ce3c19 100644 --- a/docs/build/html/rst/_static/documentation_options.js +++ b/docs/build/html/rst/_static/documentation_options.js @@ -1,6 +1,6 @@ var DOCUMENTATION_OPTIONS = { URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'), - VERSION: '0.4.3', + VERSION: '0.5.0', LANGUAGE: 'en', COLLAPSE_INDEX: false, BUILDER: 'html', diff --git a/docs/build/html/rst/_static/pygments.css b/docs/build/html/rst/_static/pygments.css index 9c45769..5c8cad8 100644 --- a/docs/build/html/rst/_static/pygments.css +++ b/docs/build/html/rst/_static/pygments.css @@ -22,6 +22,7 @@ .highlight .cs { color: #8f5902; font-style: italic } /* Comment.Special */ .highlight .gd { color: #a40000 } /* Generic.Deleted */ .highlight .ge { color: #000000; font-style: italic } /* Generic.Emph */ +.highlight .ges { color: #000000; font-weight: bold; font-style: italic } /* Generic.EmphStrong */ .highlight .gr { color: #ef2929 } /* Generic.Error */ .highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */ .highlight .gi { color: #00A000 } /* Generic.Inserted */ @@ -89,35 +90,36 @@ body[data-theme="dark"] .highlight td.linenos .special { color: #000000; backgro body[data-theme="dark"] .highlight span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } body[data-theme="dark"] .highlight .hll { background-color: #49483e } body[data-theme="dark"] .highlight { background: #272822; color: #f8f8f2 } -body[data-theme="dark"] .highlight .c { color: #75715e } /* Comment */ -body[data-theme="dark"] .highlight .err { color: #960050; background-color: #1e0010 } /* Error */ +body[data-theme="dark"] .highlight .c { color: #959077 } /* Comment */ +body[data-theme="dark"] .highlight .err { color: #ed007e; background-color: #1e0010 } /* Error */ body[data-theme="dark"] .highlight .esc { color: #f8f8f2 } /* Escape */ body[data-theme="dark"] .highlight .g { color: #f8f8f2 } /* Generic */ body[data-theme="dark"] .highlight .k { color: #66d9ef } /* Keyword */ body[data-theme="dark"] .highlight .l { color: #ae81ff } /* Literal */ body[data-theme="dark"] .highlight .n { color: #f8f8f2 } /* Name */ -body[data-theme="dark"] .highlight .o { color: #f92672 } /* Operator */ +body[data-theme="dark"] .highlight .o { color: #ff4689 } /* Operator */ body[data-theme="dark"] .highlight .x { color: #f8f8f2 } /* Other */ body[data-theme="dark"] .highlight .p { color: #f8f8f2 } /* Punctuation */ -body[data-theme="dark"] .highlight .ch { color: #75715e } /* Comment.Hashbang */ -body[data-theme="dark"] .highlight .cm { color: #75715e } /* Comment.Multiline */ -body[data-theme="dark"] .highlight .cp { color: #75715e } /* Comment.Preproc */ -body[data-theme="dark"] .highlight .cpf { color: #75715e } /* Comment.PreprocFile */ -body[data-theme="dark"] .highlight .c1 { color: #75715e } /* Comment.Single */ -body[data-theme="dark"] .highlight .cs { color: #75715e } /* Comment.Special */ -body[data-theme="dark"] .highlight .gd { color: #f92672 } /* Generic.Deleted */ +body[data-theme="dark"] .highlight .ch { color: #959077 } /* Comment.Hashbang */ +body[data-theme="dark"] .highlight .cm { color: #959077 } /* Comment.Multiline */ +body[data-theme="dark"] .highlight .cp { color: #959077 } /* Comment.Preproc */ +body[data-theme="dark"] .highlight .cpf { color: #959077 } /* Comment.PreprocFile */ +body[data-theme="dark"] .highlight .c1 { color: #959077 } /* Comment.Single */ +body[data-theme="dark"] .highlight .cs { color: #959077 } /* Comment.Special */ +body[data-theme="dark"] .highlight .gd { color: #ff4689 } /* Generic.Deleted */ body[data-theme="dark"] .highlight .ge { color: #f8f8f2; font-style: italic } /* Generic.Emph */ +body[data-theme="dark"] .highlight .ges { color: #f8f8f2; font-weight: bold; font-style: italic } /* Generic.EmphStrong */ body[data-theme="dark"] .highlight .gr { color: #f8f8f2 } /* Generic.Error */ body[data-theme="dark"] .highlight .gh { color: #f8f8f2 } /* Generic.Heading */ body[data-theme="dark"] .highlight .gi { color: #a6e22e } /* Generic.Inserted */ body[data-theme="dark"] .highlight .go { color: #66d9ef } /* Generic.Output */ -body[data-theme="dark"] .highlight .gp { color: #f92672; font-weight: bold } /* Generic.Prompt */ +body[data-theme="dark"] .highlight .gp { color: #ff4689; font-weight: bold } /* Generic.Prompt */ body[data-theme="dark"] .highlight .gs { color: #f8f8f2; font-weight: bold } /* Generic.Strong */ -body[data-theme="dark"] .highlight .gu { color: #75715e } /* Generic.Subheading */ +body[data-theme="dark"] .highlight .gu { color: #959077 } /* Generic.Subheading */ body[data-theme="dark"] .highlight .gt { color: #f8f8f2 } /* Generic.Traceback */ body[data-theme="dark"] .highlight .kc { color: #66d9ef } /* Keyword.Constant */ body[data-theme="dark"] .highlight .kd { color: #66d9ef } /* Keyword.Declaration */ -body[data-theme="dark"] .highlight .kn { color: #f92672 } /* Keyword.Namespace */ +body[data-theme="dark"] .highlight .kn { color: #ff4689 } /* Keyword.Namespace */ body[data-theme="dark"] .highlight .kp { color: #66d9ef } /* Keyword.Pseudo */ body[data-theme="dark"] .highlight .kr { color: #66d9ef } /* Keyword.Reserved */ body[data-theme="dark"] .highlight .kt { color: #66d9ef } /* Keyword.Type */ @@ -136,9 +138,9 @@ body[data-theme="dark"] .highlight .nl { color: #f8f8f2 } /* Name.Label */ body[data-theme="dark"] .highlight .nn { color: #f8f8f2 } /* Name.Namespace */ body[data-theme="dark"] .highlight .nx { color: #a6e22e } /* Name.Other */ body[data-theme="dark"] .highlight .py { color: #f8f8f2 } /* Name.Property */ -body[data-theme="dark"] .highlight .nt { color: #f92672 } /* Name.Tag */ +body[data-theme="dark"] .highlight .nt { color: #ff4689 } /* Name.Tag */ body[data-theme="dark"] .highlight .nv { color: #f8f8f2 } /* Name.Variable */ -body[data-theme="dark"] .highlight .ow { color: #f92672 } /* Operator.Word */ +body[data-theme="dark"] .highlight .ow { color: #ff4689 } /* Operator.Word */ body[data-theme="dark"] .highlight .pm { color: #f8f8f2 } /* Punctuation.Marker */ body[data-theme="dark"] .highlight .w { color: #f8f8f2 } /* Text.Whitespace */ body[data-theme="dark"] .highlight .mb { color: #ae81ff } /* Literal.Number.Bin */ @@ -174,35 +176,36 @@ body:not([data-theme="light"]) .highlight td.linenos .special { color: #000000; body:not([data-theme="light"]) .highlight span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } body:not([data-theme="light"]) .highlight .hll { background-color: #49483e } body:not([data-theme="light"]) .highlight { background: #272822; color: #f8f8f2 } -body:not([data-theme="light"]) .highlight .c { color: #75715e } /* Comment */ -body:not([data-theme="light"]) .highlight .err { color: #960050; background-color: #1e0010 } /* Error */ +body:not([data-theme="light"]) .highlight .c { color: #959077 } /* Comment */ +body:not([data-theme="light"]) .highlight .err { color: #ed007e; background-color: #1e0010 } /* Error */ body:not([data-theme="light"]) .highlight .esc { color: #f8f8f2 } /* Escape */ body:not([data-theme="light"]) .highlight .g { color: #f8f8f2 } /* Generic */ body:not([data-theme="light"]) .highlight .k { color: #66d9ef } /* Keyword */ body:not([data-theme="light"]) .highlight .l { color: #ae81ff } /* Literal */ body:not([data-theme="light"]) .highlight .n { color: #f8f8f2 } /* Name */ -body:not([data-theme="light"]) .highlight .o { color: #f92672 } /* Operator */ +body:not([data-theme="light"]) .highlight .o { color: #ff4689 } /* Operator */ body:not([data-theme="light"]) .highlight .x { color: #f8f8f2 } /* Other */ body:not([data-theme="light"]) .highlight .p { color: #f8f8f2 } /* Punctuation */ -body:not([data-theme="light"]) .highlight .ch { color: #75715e } /* Comment.Hashbang */ -body:not([data-theme="light"]) .highlight .cm { color: #75715e } /* Comment.Multiline */ -body:not([data-theme="light"]) .highlight .cp { color: #75715e } /* Comment.Preproc */ -body:not([data-theme="light"]) .highlight .cpf { color: #75715e } /* Comment.PreprocFile */ -body:not([data-theme="light"]) .highlight .c1 { color: #75715e } /* Comment.Single */ -body:not([data-theme="light"]) .highlight .cs { color: #75715e } /* Comment.Special */ -body:not([data-theme="light"]) .highlight .gd { color: #f92672 } /* Generic.Deleted */ +body:not([data-theme="light"]) .highlight .ch { color: #959077 } /* Comment.Hashbang */ +body:not([data-theme="light"]) .highlight .cm { color: #959077 } /* Comment.Multiline */ +body:not([data-theme="light"]) .highlight .cp { color: #959077 } /* Comment.Preproc */ +body:not([data-theme="light"]) .highlight .cpf { color: #959077 } /* Comment.PreprocFile */ +body:not([data-theme="light"]) .highlight .c1 { color: #959077 } /* Comment.Single */ +body:not([data-theme="light"]) .highlight .cs { color: #959077 } /* Comment.Special */ +body:not([data-theme="light"]) .highlight .gd { color: #ff4689 } /* Generic.Deleted */ body:not([data-theme="light"]) .highlight .ge { color: #f8f8f2; font-style: italic } /* Generic.Emph */ +body:not([data-theme="light"]) .highlight .ges { color: #f8f8f2; font-weight: bold; font-style: italic } /* Generic.EmphStrong */ body:not([data-theme="light"]) .highlight .gr { color: #f8f8f2 } /* Generic.Error */ body:not([data-theme="light"]) .highlight .gh { color: #f8f8f2 } /* Generic.Heading */ body:not([data-theme="light"]) .highlight .gi { color: #a6e22e } /* Generic.Inserted */ body:not([data-theme="light"]) .highlight .go { color: #66d9ef } /* Generic.Output */ -body:not([data-theme="light"]) .highlight .gp { color: #f92672; font-weight: bold } /* Generic.Prompt */ +body:not([data-theme="light"]) .highlight .gp { color: #ff4689; font-weight: bold } /* Generic.Prompt */ body:not([data-theme="light"]) .highlight .gs { color: #f8f8f2; font-weight: bold } /* Generic.Strong */ -body:not([data-theme="light"]) .highlight .gu { color: #75715e } /* Generic.Subheading */ +body:not([data-theme="light"]) .highlight .gu { color: #959077 } /* Generic.Subheading */ body:not([data-theme="light"]) .highlight .gt { color: #f8f8f2 } /* Generic.Traceback */ body:not([data-theme="light"]) .highlight .kc { color: #66d9ef } /* Keyword.Constant */ body:not([data-theme="light"]) .highlight .kd { color: #66d9ef } /* Keyword.Declaration */ -body:not([data-theme="light"]) .highlight .kn { color: #f92672 } /* Keyword.Namespace */ +body:not([data-theme="light"]) .highlight .kn { color: #ff4689 } /* Keyword.Namespace */ body:not([data-theme="light"]) .highlight .kp { color: #66d9ef } /* Keyword.Pseudo */ body:not([data-theme="light"]) .highlight .kr { color: #66d9ef } /* Keyword.Reserved */ body:not([data-theme="light"]) .highlight .kt { color: #66d9ef } /* Keyword.Type */ @@ -221,9 +224,9 @@ body:not([data-theme="light"]) .highlight .nl { color: #f8f8f2 } /* Name.Label * body:not([data-theme="light"]) .highlight .nn { color: #f8f8f2 } /* Name.Namespace */ body:not([data-theme="light"]) .highlight .nx { color: #a6e22e } /* Name.Other */ body:not([data-theme="light"]) .highlight .py { color: #f8f8f2 } /* Name.Property */ -body:not([data-theme="light"]) .highlight .nt { color: #f92672 } /* Name.Tag */ +body:not([data-theme="light"]) .highlight .nt { color: #ff4689 } /* Name.Tag */ body:not([data-theme="light"]) .highlight .nv { color: #f8f8f2 } /* Name.Variable */ -body:not([data-theme="light"]) .highlight .ow { color: #f92672 } /* Operator.Word */ +body:not([data-theme="light"]) .highlight .ow { color: #ff4689 } /* Operator.Word */ body:not([data-theme="light"]) .highlight .pm { color: #f8f8f2 } /* Punctuation.Marker */ body:not([data-theme="light"]) .highlight .w { color: #f8f8f2 } /* Text.Whitespace */ body:not([data-theme="light"]) .highlight .mb { color: #ae81ff } /* Literal.Number.Bin */ diff --git a/docs/build/html/rst/api.html b/docs/build/html/rst/api.html index 12a0de7..00ef2af 100644 --- a/docs/build/html/rst/api.html +++ b/docs/build/html/rst/api.html @@ -3,10 +3,10 @@ - + - Emulation - PufferLib 0.4.3 documentation + Emulation - PufferLib 0.5.0 documentation @@ -190,7 +190,7 @@
@@ -213,7 +213,7 @@ -
PufferLib 0.4: Ready to Take on Bigger Fish
+
PufferLib 0.5: A Bigger EnvPool for Growing Puffers
diff --git a/docs/build/html/rst/blog.html b/docs/build/html/rst/blog.html index 23783aa..204cb1c 100644 --- a/docs/build/html/rst/blog.html +++ b/docs/build/html/rst/blog.html @@ -6,7 +6,7 @@ - PufferLib 0.4: Ready to Take on Bigger Fish - PufferLib 0.4.3 documentation + PufferLib 0.5: A Bigger EnvPool for Growing Puffers - PufferLib 0.5.0 documentation @@ -190,7 +190,7 @@
@@ -213,7 +213,7 @@ +
@@ -559,6 +586,17 @@

Environments + +
+

MiniHack Learning Environment is a stripped down version of NetHack with support for level editing and custom procedural generation.

+
+

+
@@ -595,13 +633,12 @@

Environments#

  • No continuous action spaces (WIP)

  • -
  • Pre-gymnasium Gym and PettingZoo only (WIP)

  • Support for heterogenous observations and actions requires you to specify teams such that each team has the same observation and action space. There’s no good way around this.

License#

-

PufferLib is free and open-source software under the MIT license. This is the full set of tools maintained by PufferAI; we do not have private repositories with additional utilities.

+

PufferLib is free and open-source software under the MIT license. This is the full set of tools maintained by PufferAI. Dev branches are public and we do not have private repositories with additional utilities.

diff --git a/docs/build/html/search.html b/docs/build/html/search.html index 3a6aae3..1b40b0e 100644 --- a/docs/build/html/search.html +++ b/docs/build/html/search.html @@ -4,7 +4,7 @@ - Search - PufferLib 0.4.3 documentation + Search - PufferLib 0.5.0 documentation @@ -187,7 +187,7 @@
@@ -210,7 +210,7 @@
-While RLlib is great on paper, there are currently a few issues. The pre-gymnasium 2.0 release is very buggy and has next to no error checking on the user API. The latest version may be more stable, but it pins a very recent version of Gymnasium that breaks compatiblity with many environments. We have a simple running script `here `_ that works with 2.0 for now. We will update this when the situation improves. +We have previously supported RLLib and may again in the future. RLlib has not received updates in a while, and the current release is very buggy. We will update this if the situation improves. Environments ############ -We also provide a registry of environments and models that are supported out of the box. These environments are already set up for you in PufferTank and are used in our test cases to ensure they work with PufferLib. Several also include reasonable baseline policies. Join our Discord if you would like to add setup and tests for new environments or improvements to any of the baselines. +We also provide integrations for many environments out of the box. Non-pip dependencies are already set up for you in PufferTank. Several environments also include reasonable baseline policies. Join our Discord if you would like to add setup and tests for new environments or improvements to any of the baselines. .. raw:: html @@ -261,12 +263,12 @@ We also provide a registry of environments and models that are supported out of
-

Arcade Learning Environment provides a Gym interface for classic Atari games. This is the most popular benchmark for reinforcement learning algorithms.

+

Pokemon Red is one of the original Pokemon games for gameboy. This project uses the game as an environment for reinforcement learning. We are actively supporting development on this one!

@@ -283,12 +285,23 @@ We also provide a registry of environments and models that are supported out of
-

Neural MMO is a massively multiagent environment for reinforcement learning. It combines large agent populations with high per-agent complexity and is the most actively maintained (by me) project on this list.

+

Arcade Learning Environment provides a Gym interface for classic Atari games. This is the most popular benchmark for reinforcement learning algorithms.

+
+
+ +
+
+ + Star Minigrid + +
+
+

Minigrid is a 2D grid-world environment engine and a collection of builtin environments. The target is flexible and computationally efficient RL research.

@@ -303,6 +316,17 @@ We also provide a registry of environments and models that are supported out of
+
+
+ + Star Neural MMO + +
+
+

Neural MMO is a massively multiagent environment for reinforcement learning. It combines large agent populations with high per-agent complexity and is the most actively maintained (by me) project on this list.

+
+
+ +
+ +
+

MiniHack Learning Environment is a stripped down version of NetHack with support for level editing and custom procedural generation.

+
+
+