diff --git a/docs/build/doctrees/environment.pickle b/docs/build/doctrees/environment.pickle
index bdf9754..afc1e24 100644
Binary files a/docs/build/doctrees/environment.pickle and b/docs/build/doctrees/environment.pickle differ
diff --git a/docs/build/doctrees/index.doctree b/docs/build/doctrees/index.doctree
index c7cb16c..dba0f4c 100644
Binary files a/docs/build/doctrees/index.doctree and b/docs/build/doctrees/index.doctree differ
diff --git a/docs/build/doctrees/rst/api.doctree b/docs/build/doctrees/rst/api.doctree
index b069046..608833c 100644
Binary files a/docs/build/doctrees/rst/api.doctree and b/docs/build/doctrees/rst/api.doctree differ
diff --git a/docs/build/doctrees/rst/blog.doctree b/docs/build/doctrees/rst/blog.doctree
index fa0abdf..a65fe47 100644
Binary files a/docs/build/doctrees/rst/blog.doctree and b/docs/build/doctrees/rst/blog.doctree differ
diff --git a/docs/build/doctrees/rst/landing.doctree b/docs/build/doctrees/rst/landing.doctree
index b2ba35c..a53e495 100644
Binary files a/docs/build/doctrees/rst/landing.doctree and b/docs/build/doctrees/rst/landing.doctree differ
diff --git a/docs/build/html/.buildinfo b/docs/build/html/.buildinfo
index c6627a1..8037061 100644
--- a/docs/build/html/.buildinfo
+++ b/docs/build/html/.buildinfo
@@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: 4bbe4e1bdc45b7d096f5e7a0a5eb5873
+config: 6be11b3893c1b9feee4dcb2d0620068b
 tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/docs/build/html/_images/0-5_blog_envpool.png b/docs/build/html/_images/0-5_blog_envpool.png
new file mode 100644
index 0000000..46a472e
Binary files /dev/null and b/docs/build/html/_images/0-5_blog_envpool.png differ
diff --git a/docs/build/html/_images/0-5_blog_header.png b/docs/build/html/_images/0-5_blog_header.png
new file mode 100644
index 0000000..1c49024
Binary files /dev/null and b/docs/build/html/_images/0-5_blog_header.png differ
diff --git a/docs/build/html/_sources/rst/api.rst.txt b/docs/build/html/_sources/rst/api.rst.txt
index 505e364..58dc7e0 100644
--- a/docs/build/html/_sources/rst/api.rst.txt
+++ b/docs/build/html/_sources/rst/api.rst.txt
@@ -9,7 +9,7 @@ Emulation
 
 Wrap your environments for broad compatibility. Supports passing creator functions, classes, or env objects. The API of the returned PufferEnv is the same as Gym/PettingZoo.
 
-.. autoclass:: pufferlib.emulation.GymPufferEnv
+.. autoclass:: pufferlib.emulation.GymnasiumPufferEnv
    :members:
    :undoc-members:
    :noindex:
@@ -19,93 +19,21 @@ Wrap your environments for broad compatibility. Supports passing creator functio
    :undoc-members:
    :noindex:
 
-Registry
-########
+Environments
+############
 
-make_env functions and policies for included environments.
+All included environments expose make_env and env_creator functions. make_env is the one that you want most of the time. The other one is used to expose e.g. class interfaces for environments that support them so that you can pass around static references.
 
-Atari
-*****
-
-.. automodule:: pufferlib.registry.atari
-   :members:
-   :undoc-members:
-   :noindex:
+Additionally, all environments expose a Policy class with a baseline model. Note that not all environments have *custom* policies, and the default simply flattens observations before applying a linear layer. Atari, Procgen, Neural MMO, Nethack/Minihack, and Pokemon Red currently have reasonable policies.
 
+The PufferLib Squared environment is used as an example below. Everything is exposed through __init__, so you can call these methods through e.g. pufferlib.environments.squared.make_env
 
-Butterfly
-*********
-
-.. automodule:: pufferlib.registry.butterfly
+.. automodule:: pufferlib.environments.squared.environment
    :members:
    :undoc-members:
    :noindex:
 
-
-Classic Control
-***************
-
-.. automodule:: pufferlib.registry.classic_control
-   :members:
-   :undoc-members:
-   :noindex:
-
-Crafter
-*******
-
-.. automodule:: pufferlib.registry.crafter
-   :members:
-   :undoc-members:
-   :noindex:
-
-Griddly
-*******
-
-.. automodule:: pufferlib.registry.griddly
-   :members:
-   :undoc-members:
-   :noindex:
-
-
-MAgent
-******
-
-.. automodule:: pufferlib.registry.magent
-   :members:
-   :undoc-members:
-   :noindex:
-
-
-MicroRTS
-********
-
-.. automodule:: pufferlib.registry.microrts
-   :members:
-   :undoc-members:
-   :noindex:
-
-
-NetHack
-*******
-
-.. automodule:: pufferlib.registry.nethack
-   :members:
-   :undoc-members:
-   :noindex:
-
-
-Neural MMO
-**********
-
-.. automodule:: pufferlib.registry.nmmo
-   :members:
-   :undoc-members:
-   :noindex:
-
-Procgen
-*******
-
-.. automodule:: pufferlib.registry.procgen
+.. autoclass:: pufferlib.environments.squared.torch.Policy
    :members:
    :undoc-members:
    :noindex:
@@ -113,7 +41,7 @@ Procgen
 Models
 ######
 
-PufferLib model API and default policies
+PufferLib model default policies and optional API. These are not required to use PufferLib.
 
 .. automodule:: pufferlib.models
    :members:
@@ -150,7 +78,7 @@ Wrap your PyTorch policies for use with CleanRL
    :undoc-members:
    :noindex:
 
-Recurrence requires you to subclass our base policy instead. See the default policies for examples.
+Wrap your PyTorch policies for use with CleanRL but add an LSTM. This requires you to use our policy API. It's pretty simple -- see the default policies for examples.
 
 .. autoclass:: pufferlib.frameworks.cleanrl.RecurrentPolicy
    :members:
@@ -160,9 +88,14 @@ Recurrence requires you to subclass our base policy instead. See the default pol
 RLlib Binding
 #############
 
-Wrap your policies for use with RLlib (WIP)
+Wrap your policies for use with RLlib (Shelved until RLlib is more stable)
 
 .. automodule:: pufferlib.frameworks.rllib
    :members:
    :undoc-members:
-   :noindex:
\ No newline at end of file
+   :noindex:
+
+SB3 Binding
+###########
+
+Coming soon!
diff --git a/docs/build/html/_sources/rst/blog.rst.txt b/docs/build/html/_sources/rst/blog.rst.txt
index 86b1d2a..9b33d8b 100644
--- a/docs/build/html/_sources/rst/blog.rst.txt
+++ b/docs/build/html/_sources/rst/blog.rst.txt
@@ -11,6 +11,73 @@
      </video>
    </center>
 
+PufferLib 0.5: A Bigger EnvPool for Growing Puffers
+###################################################
+
+This is what reinforcement learning does to your CPU utilization.
+
+.. figure:: ../_static/0-5_blog_header.png
+
+You wouldn’t pack a box this way, right? With PufferLib 0.5, we are releasing a Python implementation of EnvPool to solve this problem. **TL;DR: ~20% performance improvement across most workloads, up to 2x for complex environments, and native multiagent support.**
+
+.. figure:: ../_static/0-5_blog_envpool.png
+
+If you just want the enhancements, you can pip install -U pufferlib. But if you’d like to see a bit behind the curtain, read on!
+
+The Simulation Crisis
+*********************
+
+You want to do some RL research, so you install Atari. Say it runs at 1000 steps/second on 1 core and 5000 steps/second on 6 cores. Now, you decide you want to work on a more interesting environment and happen upon Neural MMO, a brilliant project that must have been developed by a truly fantastic team. It runs at 1500 steps/second – faster than Atari! So you scale it up to 6 cores and it runs at … 1800 steps per second. What gives?
+
+The problem is that environments simulated on different cores do not run at the same speed. Even if they did, many modern CPUs have cores that run at different speeds. Parallelization overhead is mostly the sum of:
+-  Launching/synchronization overhead. This is roughly 0.1 ms per process and is linear in the number of processes. At ~100 steps per second, you can ignore it. At >10,000 steps/second, it is the main limiting factor.
+- Environment variance. This is defined by the ratio mu/std of the environment simulation time and scales with the square root of the number of processes. For 24 processes, 10% std is 20% overhead and 100% std is 300% overhead.
+- Different core speeds. Many modern CPUs, especially Intel desktop series processors, feature additional cores that are ~20% slower than the main cores.
+- Model latency. This is the time taken to transfer observations to GPU, run the model, and transfer actions to CPU. It is not technically part of multiprocesssing overhead, but naive implementations will leave CPUs idle during model inference.
+
+As a rule of thumb, simple RL environments have < 10% variance because the code is always simulating roughly the same thing. Complex environments, especially ones with variable numbers of agents, can have > 100% variance because different code runs depending on the current state. On the other hand, if your environment has 100 agents, you are effectively running 100x fewer simulations for the same data, so launching/synchronization overhead is lower.
+
+The Solution
+************
+
+Run multiple environments per process if you have > ~2000 sps environment with variance < ~10%. This will reduce the impact of launching/synchronization overhead and also reduces variance because you are summing over samples. In PufferLib, we typically enable this only for environments > ~5000 sps because of interactions with the optimizations below.
+
+Simulate multiple buffers of environments so that one buffer is running while your model is processing observations from the other. This technique was introduced by https://github.com/alex-petrenko/sample-factory and does not speed up simulation, but it allows you to interleave simulations from two sets of environments. It’s a good trick, but it is superseded by the final optimization, which is faster and simpler.
+
+Run a pool of environments and sample from the first ones to finish stepping. For example, if you want a batch of 24 observations, you might run 64 environments. At each step, the 24 for which you have computed actions are going to take a while to simulate, but you can still select the fastest 24 from the other 64-24=40 environments. This technique was introduced by https://github.com/sail-sg/envpool and is massively effective, but the original implementation is only for specific C/C++ environments. PufferLib’s implementation is in Python, so it is slower, but it works for arbitrary Python environments and includes native multiagent support.
+
+Experiments
+***********
+
+To evaluate the performance of different backends, I am using a 13900k (24 cores) on a max specced Maingear desktop running a minimal Debian 12 install. We test 9 different simulated environments: 1e-2 to 1-4 mean delay with 0-100% delay std. For each environment, we spawn 1, 6, 24, 96, and 192 processes for each backend tested (Gymnasium’s and Pufferlib’s serial and multiprocessing implementations + Pufferlib’s pool). We also have Ray implementations compatible with our pooling code, but that will be a separate post. Additionally, PufferLib implementations sweep over (1, 2, 4) environments per process and PufferLib pool will compute 24 observations at a time. We do not consider model latency, which can yield another 2x relative performance for pooling on specific workloads.
+
+.. figure:: ../_static/0-5_blog_envpool.png
+
+9 groups of bars, each for one environment. 5 groups of bars per environment, each for a specific number of processes. The serial Gymasium/PufferLib experiments match in all cases. The best PufferLib settings are 10-20% faster than the best Gymasium settings for all workloads and can be up to 2x faster for environments with a high standard deviation in important cases (for instance, you may not want to run 192 copies of heavy environments). Again, this is before even considering the time saved by interleaving with the model forward pass.
+
+All of the implementations start to dip ~10% at 1,000 steps/second and ~50% at 10,000 steps/second. To make absolutely sure that this overhead is unavoidable, I reimplemented the entire pool architecture as minimally as possible, without any of the environment wrapper or data transfer overhead:
+
+SPS: 10734.36 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 1 batch_size: 1 sync: False
+SPS: 11640.42 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 1 batch_size: 1 sync: True
+SPS: 32715.65 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 6 batch_size: 6 sync: False
+SPS: 27635.31 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 6 batch_size: 6 sync: True
+SPS: 22681.48 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 24 batch_size: 6 sync: False
+SPS: 26183.73 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 24 batch_size: 24 sync: False
+SPS: 30120.75 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 24 batch_size: 6 sync: True
+
+As it turns out, Python’s multiprocessing caps around 10,000 steps per second per worker. There is still room for improvement by running more environments per process, but at this speed, small optimizations to the data processing code start to matter much more.
+
+Technical Details and Gotchas
+****************************
+
+PufferLib’s vectorization library is extremely concise – around 800 lines for serial, multiprocessing, and ray backends with support for PufferLib’s Gymnasium and PettingZoo wrappers. Adding envpool only required changing around 100 lines of code but required a lot of performance testing:
+Don’t use multiprocessing.Queue. There’s no fast way to poll which processes are done. Instead, use multiprocessing.Pipe and poll with selectors. I have not seen noticeable overhead from this in any of my tests.
+Don’t use time.sleep(), as this will trigger context switching, or time.time(), as this will include time spent on other processes. Use time.process_time() if you want an equal slice per core or count to ~150M/second (time it on your machine) if you want a fixed amount of work.
+
+The ray backend was extremely easy to implement thanks to ray.wait(). It is unfortunately too slow for most environments, but I wish standard multiprocessing used the Ray API, if not the architecture. The library itself has some cleanup issues that can cause crashes during heavy performance tests, which is why results are not included in this post.
+
+There’s one other thing I want to mention for people looking at the code. I was doing some experimental procedural stuff testing different programming paradigms, so the actual class interfaces are in __init__. It’s pretty much equivalent to one subclass per backend. 
+
 PufferLib 0.4: Ready to Take on Bigger Fish
 ###########################################
 
diff --git a/docs/build/html/_sources/rst/landing.rst.txt b/docs/build/html/_sources/rst/landing.rst.txt
index 4b4661f..d4d72ab 100644
--- a/docs/build/html/_sources/rst/landing.rst.txt
+++ b/docs/build/html/_sources/rst/landing.rst.txt
@@ -44,7 +44,7 @@ You have an environment, a PyTorch model, and a reinforcement learning library t
 
 |
 
-Join our community Discord for support and Discussion, follow my Twitter for news, and star the repo to feed the puffer. :download:`Whitepaper <../_static/neurips_2023_aloe.pdf>` appearing at NeurIPS 2023 ALOE Workshop. Come say hi!
+Join our community Discord for support and Discussion, follow my Twitter for news, and star the repo to feed the puffer. We also have a :download:`Whitepaper <../_static/neurips_2023_aloe.pdf>` featured at the NeurIPS 2023 ALOE workshop.
 
 .. dropdown:: Installation
 
@@ -54,7 +54,7 @@ Join our community Discord for support and Discussion, follow my Twitter for new
 
       `PufferTank <https://github.com/pufferai/puffertank>`_ is a GPU container with PufferLib and dependencies for all environments in the registry, including some that are slow and tricky to install.
 
-      If you are new to containers, clone the repository and open it in VSCode. You will need to install the Dev Container plugin as well as Docker Desktop. VSCode will then detect the settings in .devcontainer and set up the container for you.
+      If you have not used containers before and just want everything to work, clone the repository and open it in VSCode. You will need to install the Dev Container plugin as well as Docker Desktop. VSCode will then detect the settings in .devcontainer and set up the container for you.
 
     .. tab-item:: Pip
 
@@ -76,7 +76,7 @@ Join our community Discord for support and Discussion, follow my Twitter for new
 
    **Joseph Suarez**: Creator and developer of PufferLib
 
-   **David Bloomin**: Policy pool/store/selector
+   **David Bloomin**: 0.4 policy pool/store/selector
 
    **Nick Jenkins**: Layout for the system architecture diagram. Adversary.design.
 
@@ -86,40 +86,45 @@ Join our community Discord for support and Discussion, follow my Twitter for new
 
 **You can open this guide in a Colab notebook by clicking the demo button at the top of this page**
 
-Complex environments may have heirarchical observations and actions, variable numbers of agents, and other quirks that make them difficult to work with and incompatible with standard reinforcement learning libraries. PufferLib's emulation layer makes every environment look like it has flat observations and actions and a constant number of agents, with no changes to the underlying environment. Here's how it works with two notoriously complex environments, NetHack and Neural MMO.
+Complex environments may have heirarchical observations and actions, variable numbers of agents, and other quirks that make them difficult to work with and incompatible with standard reinforcement learning libraries. PufferLib's emulation layer makes every environment look like it has flat observations/actions and a constant number of agents. Here's how it works with NetHack and Neural MMO, two notoriously complex environments.
 
 .. code-block:: python
 
   import pufferlib.emulation
+  import pufferlib.wrappers
 
   import nle, nmmo
 
   def nmmo_creator():
-      return pufferlib.emulation.PettingZooPufferEnv(env_creator=nmmo.Env)
+      env = nmmo.Env()
+      env = pufferlib.wrappers.PettingZooTruncatedWrapper(env)
+      return pufferlib.emulation.PettingZooPufferEnv(env=env)
 
   def nethack_creator():
-      return pufferlib.emulation.GymPufferEnv(env_creator=nle.env.NLE)
+      return pufferlib.emulation.GymnasiumPufferEnv(env_creator=nle.env.NLE)
 
-You can pass envs by class, creator function, or object, with or without additional arguments. These wrappers enable us to make some optimizations to vectorization code that would be difficult to implement otherwise. You can choose from a variety of vectorization backends. They all share the same interface with synchronous and asynchronous options.
+The wrappers give you back a Gymnasium/PettingZoo compliant environment. There is no loss of generality and no change to the underlying environment. You can wrap environments by class, creator function, or object, with or without additional arguments. These wrappers enable us to make some optimizations to vectorization code that would be difficult to implement otherwise. You can choose from a variety of vectorization backends. They all share the same interface with synchronous and asynchronous options.
 
 .. code-block:: python
 
   import pufferlib.vectorization
 
-  # vec = pufferlib.vectorization.Serial
-  vec = pufferlib.vectorization.Multiprocessing
+  vec = pufferlib.vectorization.Serial
+  # vec = pufferlib.vectorization.Multiprocessing
   # vec = pufferlib.vectorization.Ray
 
-  envs = vec(nmmo_creator, num_workers=2, envs_per_worker=2)
+  # Vectorization API. Specify total number of environments and number per worker
+  # Setting env_pool=True can be much faster but requires some tweaks to learning code
+  envs = vec(nmmo_creator, num_envs=4, envs_per_worker=2, env_pool=False)
 
-  sync = True
-  if sync:
-      obs = envs.reset()
-  else:
-      envs.async_reset()
-      obs, _, _, _ = envs.recv()
+  # Synchronous API - reset/step
+  # obs = envs.reset()[0]
 
-We suggest Serial for debugging and Multiprocessing for most training runs. Ray is a good option if you need to scale beyond a single machine.
+  # Asynchronous API - async_reset, send/recv
+  envs.async_reset()
+  obs = envs.recv()[0]
+
+Our backends support asynchronous on-policy sampling through a Python implementation of EnvPool. This makes them *faster* than the implementations that ship with most RL libraries. We suggest Serial for debugging and Multiprocessing for most training runs. Ray is a good option if you need to scale beyond a single machine.
 
 PufferLib allows you to write vanilla PyTorch policies and use them with multiple learning libraries. We take care of the details of converting between the different APIs. Here's a policy that will work with *any* environment, with a one-line wrapper for CleanRL.
 
@@ -132,7 +137,7 @@ PufferLib allows you to write vanilla PyTorch policies and use them with multipl
   import pufferlib.frameworks.cleanrl
 
   class Policy(nn.Module):
-      def __init__(self, envs):
+      def __init__(self, env):
           super().__init__()
           self.encoder = nn.Linear(np.prod(
               envs.single_observation_space.shape), 128)
@@ -151,12 +156,10 @@ PufferLib allows you to write vanilla PyTorch policies and use them with multipl
   policy = Policy(envs.driver_env)
   cleanrl_policy = pufferlib.frameworks.cleanrl.Policy(policy)
   actions = cleanrl_policy.get_action_and_value(obs)[0].numpy()
-  obs, rewards, dones, infos = envs.step(actions)
+  obs, rewards, terminals, truncateds, infos, env_id, mask = envs.step(actions)
   envs.close()
 
-There's also a lightweight, fully optional base policy class for PufferLib. It breaks the forward pass into two functions, encode_observations and decode_actions. The advantage of this is that it lets us handle recurrance for you, since every framework does this a bit differently.
-
-So far, the code above is fully general and does not rely on PufferLib support for specific environments. For convenience, we also provide a registry of environments and models. Here's a complete example.
+There's also an optional policy base class for PufferLib. It just breaks the forward pass into an encode and decode step, which allows us to handle recurrance for you. So far, the code above is fully general and does not rely on PufferLib support for specific environments. For convenience, we also provide environment hooks with standard wrappers and baseline models. Here's a complete example.
 
 .. code-block:: python
 
@@ -165,33 +168,32 @@ So far, the code above is fully general and does not rely on PufferLib support f
   import pufferlib.models
   import pufferlib.vectorization
   import pufferlib.frameworks.cleanrl
-  import pufferlib.registry.nmmo
+  import pufferlib.environments.nmmo
 
   envs = pufferlib.vectorization.Multiprocessing(
-      env_creator=pufferlib.registry.nmmo.make_env,
-      num_workers=2, envs_per_worker=2)
+      env_creator=pufferlib.environments.nmmo.make_env,
+      num_envs=4, envs_per_worker=2)
 
-  policy = pufferlib.registry.nmmo.Policy(envs.driver_env)
-  policy = pufferlib.models.RecurrentWrapper(envs, policy,
-      input_size=256, hidden_size=256)
-  cleanrl_policy = pufferlib.frameworks.cleanrl.RecurrentPolicy(policy)
+  policy = pufferlib.environments.nmmo.Policy(envs.driver_env)
+  cleanrl_policy = pufferlib.frameworks.cleanrl.Policy(policy)
 
-  obs = envs.reset()
-  obs = torch.Tensor(obs)
-  state = [torch.zeros((1, 256, 256)), torch.zeros((1, 256, 256))]
-  actions = cleanrl_policy.get_action_and_value(obs, state)[0].numpy()
-  obs, rewards, dones, infos = envs.step(actions)
+  env_outputs = envs.reset()[0]
+  obs = torch.Tensor(env_outputs)
+  actions = cleanrl_policy.get_action_and_value(obs)[0].numpy()
+  obs, rewards, terminals, truncateds, infos, env_id, mask = envs.step(actions)
   envs.close()
 
-It's that simple -- almost. If you have an environment with structured observations, you'll hvae to unpack them in the network forward pass since PufferLif will flatten them in emulation. We provide a utility for this -- just be sure to save a reference to your environment inside of the model so you have access to the observation space.
+It's that simple -- almost. If you have an environment with structured observations, you'll have to unpack them in the network forward pass since PufferLib will flatten them in emulation. We provide a utility for this.
 
 .. code-block:: python
 
-  env_outputs = pufferlib.emulation.unpack_batched_obs(
-      env_outputs, self.envs.flat_observation_space
+  obs = pufferlib.emulation.unpack_batched_obs(
+      env_outputs,
+      envs.driver_env.flat_observation_space,
+      envs.driver_env.flat_observation_structure
   )
 
-That's all you need to get started. The PufferLib repository contains full-length CleanRL scripts with PufferLib integration.
+That's all you need to get started. The PufferLib repository contains full-length CleanRL scripts with PufferLib integration. SB3 and other integrations coming soon!
 
 Libraries
 #########
@@ -223,7 +225,7 @@ PufferLib provides *pufferlib.frameworks* for the the learning libraries below.
 
 Or view it on GitHub `here <https://github.com/PufferAI/PufferLib/blob/experimental/cleanrl_ppo_atari.py>`_
 
-We are also working on a heavily customized version of CleanRL PPO with support for recurrent and non-recurrent models, async environment execution, variable agent populations, self-play, and experiment management. This is the version we use for our research and the NeurIPS 2023 Neural MMO Competition. It's still under development, but you can try it out `here <https://github.com/PufferAI/PufferLib/blob/experimental/clean_pufferl.py>`_ 
+PufferLib also includes a heavily customized version of CleanRL PPO with support for recurrent and non-recurrent models, async environment execution, variable agent populations, self-play, and experiment management. This is the version we use for our research and the NeurIPS 2023 Neural MMO Competition. You can try it out `here <https://github.com/PufferAI/PufferLib/blob/experimental/clean_pufferl.py>`_ 
 
 .. raw:: html
 
@@ -238,12 +240,12 @@ We are also working on a heavily customized version of CleanRL PPO with support
         </div>
     </div>
 
-While RLlib is great on paper, there are currently a few issues. The pre-gymnasium 2.0 release is very buggy and has next to no error checking on the user API. The latest version may be more stable, but it pins a very recent version of Gymnasium that breaks compatiblity with many environments. We have a simple running script `here <https://github.com/PufferAI/PufferLib/blob/experimental/rllib_ppo.py>`_ that works with 2.0 for now. We will update this when the situation improves.
+We have previously supported RLLib and may again in the future. RLlib has not received updates in a while, and the current release is very buggy. We will update this if the situation improves.
 
 Environments
 ############
 
-We also provide a registry of environments and models that are supported out of the box. These environments are already set up for you in PufferTank and are used in our test cases to ensure they work with PufferLib. Several also include reasonable baseline policies. Join our Discord if you would like to add setup and tests for new environments or improvements to any of the baselines.
+We also provide integrations for many environments out of the box. Non-pip dependencies are already set up for you in PufferTank. Several environments also include reasonable baseline policies. Join our Discord if you would like to add setup and tests for new environments or improvements to any of the baselines.
 
 
 .. raw:: html
@@ -261,12 +263,12 @@ We also provide a registry of environments and models that are supported out of
 
     <div style="display: flex; align-items: center; margin-bottom: 15px;">
         <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
-            <a href="https://github.com/Farama-Foundation/Arcade-Learning-Environment" target="_blank">
-                <img src="https://img.shields.io/github/stars/Farama-Foundation/Arcade-Learning-Environment?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star Arcade Learning Environment" width="100px">
+            <a href="https://github.com/PWhiddy/PokemonRedExperiments" target="_blank">
+                <img src="https://img.shields.io/github/stars/PWhiddy/PokemonRedExperiments?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star Pokemon Red" width="100px">
             </a>
         </div>
         <div>
-            <p><a href="https://github.com/Farama-Foundation/Arcade-Learning-Environment">Arcade Learning Environment</a> provides a Gym interface for classic Atari games. This is the most popular benchmark for reinforcement learning algorithms.</p>
+            <p><a href="https://github.com/PWhiddy/PokemonRedExperiments">Pokemon Red</a> is one of the original Pokemon games for gameboy. This project uses the game as an environment for reinforcement learning. We are actively supporting development on this one!</p>
         </div>
     </div>
 
@@ -283,12 +285,23 @@ We also provide a registry of environments and models that are supported out of
 
     <div style="display: flex; align-items: center; margin-bottom: 15px;">
         <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
-            <a href="https://github.com/neuralmmo/environment" target="_blank">
-                <img src="https://img.shields.io/github/stars/openai/neural-mmo?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star Neural MMO" width="100px">
+            <a href="https://github.com/Farama-Foundation/Arcade-Learning-Environment" target="_blank">
+                <img src="https://img.shields.io/github/stars/Farama-Foundation/Arcade-Learning-Environment?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star Arcade Learning Environment" width="100px">
             </a>
         </div>
         <div>
-            <p><a href="https://neuralmmo.github.io">Neural MMO</a> is a massively multiagent environment for reinforcement learning. It combines large agent populations with high per-agent complexity and is the most actively maintained (by me) project on this list.</p>
+            <p><a href="https://github.com/Farama-Foundation/Arcade-Learning-Environment">Arcade Learning Environment</a> provides a Gym interface for classic Atari games. This is the most popular benchmark for reinforcement learning algorithms.</p>
+        </div>
+    </div>
+
+    <div style="display: flex; align-items: center; margin-bottom: 15px;">
+        <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
+            <a href="https://github.com/Farama-Foundation/Minigrid" target="_blank">
+                <img src="https://img.shields.io/github/stars/Farama-Foundation/Minigrid?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star Minigrid" width="100px">
+            </a>
+        </div>
+        <div>
+            <p><a href="https://github.com/Farama-Foundation/Minigrid">Minigrid</a> is a 2D grid-world environment engine and a collection of builtin environments. The target is flexible and computationally efficient RL research.</p>
         </div>
     </div>
 
@@ -303,6 +316,17 @@ We also provide a registry of environments and models that are supported out of
         </div>
     </div>
 
+    <div style="display: flex; align-items: center; margin-bottom: 15px;">
+        <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
+            <a href="https://github.com/neuralmmo/environment" target="_blank">
+                <img src="https://img.shields.io/github/stars/openai/neural-mmo?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star Neural MMO" width="100px">
+            </a>
+        </div>
+        <div>
+            <p><a href="https://neuralmmo.github.io">Neural MMO</a> is a massively multiagent environment for reinforcement learning. It combines large agent populations with high per-agent complexity and is the most actively maintained (by me) project on this list.</p>
+        </div>
+    </div>
+
     <div style="display: flex; align-items: center; margin-bottom: 15px;">
         <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
             <a href="https://github.com/openai/procgen" target="_blank">
@@ -325,6 +349,17 @@ We also provide a registry of environments and models that are supported out of
         </div>
     </div>
 
+    <div style="display: flex; align-items: center; margin-bottom: 15px;">
+        <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
+            <a href="https://github.com/facebookresearch/minihack" target="_blank">
+                <img src="https://img.shields.io/github/stars/facebookresearch/minihack?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star MiniHack" width="100px">
+            </a>
+        </div>
+        <div>
+            <p><a href="https://github.com/facebookresearch/nle">MiniHack Learning Environment</a> is a stripped down version of NetHack with support for level editing and custom procedural generation.</p>
+        </div>
+    </div>
+
     <div style="display: flex; align-items: center; margin-bottom: 15px;">
         <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
             <a href="https://github.com/danijar/crafter" target="_blank">
@@ -362,11 +397,9 @@ Current Limitations
 ###################
 
 - No continuous action spaces (WIP)
-- Pre-gymnasium Gym and PettingZoo only (WIP)
 - Support for heterogenous observations and actions requires you to specify teams such that each team has the same observation and action space. There's no good way around this.
 
 License
 #######
 
-PufferLib is free and open-source software under the MIT license. This is the full set of tools maintained by PufferAI; we do not have private repositories with additional utilities.
-
+PufferLib is free and open-source software under the MIT license. This is the full set of tools maintained by PufferAI. Dev branches are public and we do not have private repositories with additional utilities.
diff --git a/docs/build/html/_static/0-5_blog_envpool.png b/docs/build/html/_static/0-5_blog_envpool.png
new file mode 100644
index 0000000..46a472e
Binary files /dev/null and b/docs/build/html/_static/0-5_blog_envpool.png differ
diff --git a/docs/build/html/_static/0-5_blog_header.png b/docs/build/html/_static/0-5_blog_header.png
new file mode 100644
index 0000000..1c49024
Binary files /dev/null and b/docs/build/html/_static/0-5_blog_header.png differ
diff --git a/docs/build/html/_static/documentation_options.js b/docs/build/html/_static/documentation_options.js
index 7696827..0ce3c19 100644
--- a/docs/build/html/_static/documentation_options.js
+++ b/docs/build/html/_static/documentation_options.js
@@ -1,6 +1,6 @@
 var DOCUMENTATION_OPTIONS = {
     URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'),
-    VERSION: '0.4.3',
+    VERSION: '0.5.0',
     LANGUAGE: 'en',
     COLLAPSE_INDEX: false,
     BUILDER: 'html',
diff --git a/docs/build/html/_static/pygments.css b/docs/build/html/_static/pygments.css
index 9c45769..5c8cad8 100644
--- a/docs/build/html/_static/pygments.css
+++ b/docs/build/html/_static/pygments.css
@@ -22,6 +22,7 @@
 .highlight .cs { color: #8f5902; font-style: italic } /* Comment.Special */
 .highlight .gd { color: #a40000 } /* Generic.Deleted */
 .highlight .ge { color: #000000; font-style: italic } /* Generic.Emph */
+.highlight .ges { color: #000000; font-weight: bold; font-style: italic } /* Generic.EmphStrong */
 .highlight .gr { color: #ef2929 } /* Generic.Error */
 .highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */
 .highlight .gi { color: #00A000 } /* Generic.Inserted */
@@ -89,35 +90,36 @@ body[data-theme="dark"] .highlight td.linenos .special { color: #000000; backgro
 body[data-theme="dark"] .highlight span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; }
 body[data-theme="dark"] .highlight .hll { background-color: #49483e }
 body[data-theme="dark"] .highlight { background: #272822; color: #f8f8f2 }
-body[data-theme="dark"] .highlight .c { color: #75715e } /* Comment */
-body[data-theme="dark"] .highlight .err { color: #960050; background-color: #1e0010 } /* Error */
+body[data-theme="dark"] .highlight .c { color: #959077 } /* Comment */
+body[data-theme="dark"] .highlight .err { color: #ed007e; background-color: #1e0010 } /* Error */
 body[data-theme="dark"] .highlight .esc { color: #f8f8f2 } /* Escape */
 body[data-theme="dark"] .highlight .g { color: #f8f8f2 } /* Generic */
 body[data-theme="dark"] .highlight .k { color: #66d9ef } /* Keyword */
 body[data-theme="dark"] .highlight .l { color: #ae81ff } /* Literal */
 body[data-theme="dark"] .highlight .n { color: #f8f8f2 } /* Name */
-body[data-theme="dark"] .highlight .o { color: #f92672 } /* Operator */
+body[data-theme="dark"] .highlight .o { color: #ff4689 } /* Operator */
 body[data-theme="dark"] .highlight .x { color: #f8f8f2 } /* Other */
 body[data-theme="dark"] .highlight .p { color: #f8f8f2 } /* Punctuation */
-body[data-theme="dark"] .highlight .ch { color: #75715e } /* Comment.Hashbang */
-body[data-theme="dark"] .highlight .cm { color: #75715e } /* Comment.Multiline */
-body[data-theme="dark"] .highlight .cp { color: #75715e } /* Comment.Preproc */
-body[data-theme="dark"] .highlight .cpf { color: #75715e } /* Comment.PreprocFile */
-body[data-theme="dark"] .highlight .c1 { color: #75715e } /* Comment.Single */
-body[data-theme="dark"] .highlight .cs { color: #75715e } /* Comment.Special */
-body[data-theme="dark"] .highlight .gd { color: #f92672 } /* Generic.Deleted */
+body[data-theme="dark"] .highlight .ch { color: #959077 } /* Comment.Hashbang */
+body[data-theme="dark"] .highlight .cm { color: #959077 } /* Comment.Multiline */
+body[data-theme="dark"] .highlight .cp { color: #959077 } /* Comment.Preproc */
+body[data-theme="dark"] .highlight .cpf { color: #959077 } /* Comment.PreprocFile */
+body[data-theme="dark"] .highlight .c1 { color: #959077 } /* Comment.Single */
+body[data-theme="dark"] .highlight .cs { color: #959077 } /* Comment.Special */
+body[data-theme="dark"] .highlight .gd { color: #ff4689 } /* Generic.Deleted */
 body[data-theme="dark"] .highlight .ge { color: #f8f8f2; font-style: italic } /* Generic.Emph */
+body[data-theme="dark"] .highlight .ges { color: #f8f8f2; font-weight: bold; font-style: italic } /* Generic.EmphStrong */
 body[data-theme="dark"] .highlight .gr { color: #f8f8f2 } /* Generic.Error */
 body[data-theme="dark"] .highlight .gh { color: #f8f8f2 } /* Generic.Heading */
 body[data-theme="dark"] .highlight .gi { color: #a6e22e } /* Generic.Inserted */
 body[data-theme="dark"] .highlight .go { color: #66d9ef } /* Generic.Output */
-body[data-theme="dark"] .highlight .gp { color: #f92672; font-weight: bold } /* Generic.Prompt */
+body[data-theme="dark"] .highlight .gp { color: #ff4689; font-weight: bold } /* Generic.Prompt */
 body[data-theme="dark"] .highlight .gs { color: #f8f8f2; font-weight: bold } /* Generic.Strong */
-body[data-theme="dark"] .highlight .gu { color: #75715e } /* Generic.Subheading */
+body[data-theme="dark"] .highlight .gu { color: #959077 } /* Generic.Subheading */
 body[data-theme="dark"] .highlight .gt { color: #f8f8f2 } /* Generic.Traceback */
 body[data-theme="dark"] .highlight .kc { color: #66d9ef } /* Keyword.Constant */
 body[data-theme="dark"] .highlight .kd { color: #66d9ef } /* Keyword.Declaration */
-body[data-theme="dark"] .highlight .kn { color: #f92672 } /* Keyword.Namespace */
+body[data-theme="dark"] .highlight .kn { color: #ff4689 } /* Keyword.Namespace */
 body[data-theme="dark"] .highlight .kp { color: #66d9ef } /* Keyword.Pseudo */
 body[data-theme="dark"] .highlight .kr { color: #66d9ef } /* Keyword.Reserved */
 body[data-theme="dark"] .highlight .kt { color: #66d9ef } /* Keyword.Type */
@@ -136,9 +138,9 @@ body[data-theme="dark"] .highlight .nl { color: #f8f8f2 } /* Name.Label */
 body[data-theme="dark"] .highlight .nn { color: #f8f8f2 } /* Name.Namespace */
 body[data-theme="dark"] .highlight .nx { color: #a6e22e } /* Name.Other */
 body[data-theme="dark"] .highlight .py { color: #f8f8f2 } /* Name.Property */
-body[data-theme="dark"] .highlight .nt { color: #f92672 } /* Name.Tag */
+body[data-theme="dark"] .highlight .nt { color: #ff4689 } /* Name.Tag */
 body[data-theme="dark"] .highlight .nv { color: #f8f8f2 } /* Name.Variable */
-body[data-theme="dark"] .highlight .ow { color: #f92672 } /* Operator.Word */
+body[data-theme="dark"] .highlight .ow { color: #ff4689 } /* Operator.Word */
 body[data-theme="dark"] .highlight .pm { color: #f8f8f2 } /* Punctuation.Marker */
 body[data-theme="dark"] .highlight .w { color: #f8f8f2 } /* Text.Whitespace */
 body[data-theme="dark"] .highlight .mb { color: #ae81ff } /* Literal.Number.Bin */
@@ -174,35 +176,36 @@ body:not([data-theme="light"]) .highlight td.linenos .special { color: #000000;
 body:not([data-theme="light"]) .highlight span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; }
 body:not([data-theme="light"]) .highlight .hll { background-color: #49483e }
 body:not([data-theme="light"]) .highlight { background: #272822; color: #f8f8f2 }
-body:not([data-theme="light"]) .highlight .c { color: #75715e } /* Comment */
-body:not([data-theme="light"]) .highlight .err { color: #960050; background-color: #1e0010 } /* Error */
+body:not([data-theme="light"]) .highlight .c { color: #959077 } /* Comment */
+body:not([data-theme="light"]) .highlight .err { color: #ed007e; background-color: #1e0010 } /* Error */
 body:not([data-theme="light"]) .highlight .esc { color: #f8f8f2 } /* Escape */
 body:not([data-theme="light"]) .highlight .g { color: #f8f8f2 } /* Generic */
 body:not([data-theme="light"]) .highlight .k { color: #66d9ef } /* Keyword */
 body:not([data-theme="light"]) .highlight .l { color: #ae81ff } /* Literal */
 body:not([data-theme="light"]) .highlight .n { color: #f8f8f2 } /* Name */
-body:not([data-theme="light"]) .highlight .o { color: #f92672 } /* Operator */
+body:not([data-theme="light"]) .highlight .o { color: #ff4689 } /* Operator */
 body:not([data-theme="light"]) .highlight .x { color: #f8f8f2 } /* Other */
 body:not([data-theme="light"]) .highlight .p { color: #f8f8f2 } /* Punctuation */
-body:not([data-theme="light"]) .highlight .ch { color: #75715e } /* Comment.Hashbang */
-body:not([data-theme="light"]) .highlight .cm { color: #75715e } /* Comment.Multiline */
-body:not([data-theme="light"]) .highlight .cp { color: #75715e } /* Comment.Preproc */
-body:not([data-theme="light"]) .highlight .cpf { color: #75715e } /* Comment.PreprocFile */
-body:not([data-theme="light"]) .highlight .c1 { color: #75715e } /* Comment.Single */
-body:not([data-theme="light"]) .highlight .cs { color: #75715e } /* Comment.Special */
-body:not([data-theme="light"]) .highlight .gd { color: #f92672 } /* Generic.Deleted */
+body:not([data-theme="light"]) .highlight .ch { color: #959077 } /* Comment.Hashbang */
+body:not([data-theme="light"]) .highlight .cm { color: #959077 } /* Comment.Multiline */
+body:not([data-theme="light"]) .highlight .cp { color: #959077 } /* Comment.Preproc */
+body:not([data-theme="light"]) .highlight .cpf { color: #959077 } /* Comment.PreprocFile */
+body:not([data-theme="light"]) .highlight .c1 { color: #959077 } /* Comment.Single */
+body:not([data-theme="light"]) .highlight .cs { color: #959077 } /* Comment.Special */
+body:not([data-theme="light"]) .highlight .gd { color: #ff4689 } /* Generic.Deleted */
 body:not([data-theme="light"]) .highlight .ge { color: #f8f8f2; font-style: italic } /* Generic.Emph */
+body:not([data-theme="light"]) .highlight .ges { color: #f8f8f2; font-weight: bold; font-style: italic } /* Generic.EmphStrong */
 body:not([data-theme="light"]) .highlight .gr { color: #f8f8f2 } /* Generic.Error */
 body:not([data-theme="light"]) .highlight .gh { color: #f8f8f2 } /* Generic.Heading */
 body:not([data-theme="light"]) .highlight .gi { color: #a6e22e } /* Generic.Inserted */
 body:not([data-theme="light"]) .highlight .go { color: #66d9ef } /* Generic.Output */
-body:not([data-theme="light"]) .highlight .gp { color: #f92672; font-weight: bold } /* Generic.Prompt */
+body:not([data-theme="light"]) .highlight .gp { color: #ff4689; font-weight: bold } /* Generic.Prompt */
 body:not([data-theme="light"]) .highlight .gs { color: #f8f8f2; font-weight: bold } /* Generic.Strong */
-body:not([data-theme="light"]) .highlight .gu { color: #75715e } /* Generic.Subheading */
+body:not([data-theme="light"]) .highlight .gu { color: #959077 } /* Generic.Subheading */
 body:not([data-theme="light"]) .highlight .gt { color: #f8f8f2 } /* Generic.Traceback */
 body:not([data-theme="light"]) .highlight .kc { color: #66d9ef } /* Keyword.Constant */
 body:not([data-theme="light"]) .highlight .kd { color: #66d9ef } /* Keyword.Declaration */
-body:not([data-theme="light"]) .highlight .kn { color: #f92672 } /* Keyword.Namespace */
+body:not([data-theme="light"]) .highlight .kn { color: #ff4689 } /* Keyword.Namespace */
 body:not([data-theme="light"]) .highlight .kp { color: #66d9ef } /* Keyword.Pseudo */
 body:not([data-theme="light"]) .highlight .kr { color: #66d9ef } /* Keyword.Reserved */
 body:not([data-theme="light"]) .highlight .kt { color: #66d9ef } /* Keyword.Type */
@@ -221,9 +224,9 @@ body:not([data-theme="light"]) .highlight .nl { color: #f8f8f2 } /* Name.Label *
 body:not([data-theme="light"]) .highlight .nn { color: #f8f8f2 } /* Name.Namespace */
 body:not([data-theme="light"]) .highlight .nx { color: #a6e22e } /* Name.Other */
 body:not([data-theme="light"]) .highlight .py { color: #f8f8f2 } /* Name.Property */
-body:not([data-theme="light"]) .highlight .nt { color: #f92672 } /* Name.Tag */
+body:not([data-theme="light"]) .highlight .nt { color: #ff4689 } /* Name.Tag */
 body:not([data-theme="light"]) .highlight .nv { color: #f8f8f2 } /* Name.Variable */
-body:not([data-theme="light"]) .highlight .ow { color: #f92672 } /* Operator.Word */
+body:not([data-theme="light"]) .highlight .ow { color: #ff4689 } /* Operator.Word */
 body:not([data-theme="light"]) .highlight .pm { color: #f8f8f2 } /* Punctuation.Marker */
 body:not([data-theme="light"]) .highlight .w { color: #f8f8f2 } /* Text.Whitespace */
 body:not([data-theme="light"]) .highlight .mb { color: #ae81ff } /* Literal.Number.Bin */
diff --git a/docs/build/html/genindex.html b/docs/build/html/genindex.html
index e47cc15..006f471 100644
--- a/docs/build/html/genindex.html
+++ b/docs/build/html/genindex.html
@@ -4,7 +4,7 @@
     <meta name="viewport" content="width=device-width,initial-scale=1"/>
     <meta name="color-scheme" content="light dark"><link rel="index" title="Index" href="#" /><link rel="search" title="Search" href="search.html" />
 
-    <!-- Generated with Sphinx 5.0.0 and Furo 2023.03.27 --><title>Index - PufferLib 0.4.3 documentation</title>
+    <!-- Generated with Sphinx 5.0.0 and Furo 2023.03.27 --><title>Index - PufferLib 0.5.0 documentation</title>
 <link rel="stylesheet" type="text/css" href="_static/pygments.css" />
     <link rel="stylesheet" type="text/css" href="_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
     <link rel="stylesheet" type="text/css" href="_static/design-style.1e8bd061cd6da7fc9cf755528e8ffc24.min.css" />
@@ -188,7 +188,7 @@
       </label>
     </div>
     <div class="header-center">
-      <a href="index.html"><div class="brand">PufferLib 0.4.3 documentation</div></a>
+      <a href="index.html"><div class="brand">PufferLib 0.5.0 documentation</div></a>
     </div>
     <div class="header-right">
       <div class="theme-toggle-container theme-toggle-header">
@@ -211,7 +211,7 @@
       <div class="sidebar-sticky"><a class="sidebar-brand" href="index.html">
   
   
-  <span class="sidebar-brand-text">PufferLib 0.4.3 documentation</span>
+  <span class="sidebar-brand-text">PufferLib 0.5.0 documentation</span>
   
 </a><form class="sidebar-search-container" method="get" action="search.html" role="search">
   <input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
@@ -229,15 +229,17 @@
 <p class="caption" role="heading"><span class="caption-text">API</span></p>
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="rst/api.html">Emulation</a></li>
-<li class="toctree-l1"><a class="reference internal" href="rst/api.html#registry">Registry</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rst/api.html#environments">Environments</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/api.html#models">Models</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/api.html#vectorization">Vectorization</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/api.html#cleanrl-integration">CleanRL Integration</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/api.html#rllib-binding">RLlib Binding</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rst/api.html#sb3-binding">SB3 Binding</a></li>
 </ul>
 <p class="caption" role="heading"><span class="caption-text">Blog</span></p>
 <ul>
-<li class="toctree-l1"><a class="reference internal" href="rst/blog.html">PufferLib 0.4: Ready to Take on Bigger Fish</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rst/blog.html">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rst/blog.html#pufferlib-0-4-ready-to-take-on-bigger-fish">PufferLib 0.4: Ready to Take on Bigger Fish</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/blog.html#pufferlib-0-2-ready-to-take-on-the-big-fish">PufferLib 0.2: Ready to Take on the Big Fish</a></li>
 </ul>
 
diff --git a/docs/build/html/index.html b/docs/build/html/index.html
index fbb3c7c..e10ce4d 100644
--- a/docs/build/html/index.html
+++ b/docs/build/html/index.html
@@ -6,7 +6,7 @@
 <link rel="index" title="Index" href="genindex.html" /><link rel="search" title="Search" href="search.html" /><link rel="next" title="Libraries" href="rst/landing.html" />
 
     <!-- Generated with Sphinx 5.0.0 and Furo 2023.03.27 -->
-        <title>PufferLib 0.4.3 documentation</title>
+        <title>PufferLib 0.5.0 documentation</title>
       <link rel="stylesheet" type="text/css" href="_static/pygments.css" />
     <link rel="stylesheet" type="text/css" href="_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
     <link rel="stylesheet" type="text/css" href="_static/design-style.1e8bd061cd6da7fc9cf755528e8ffc24.min.css" />
@@ -190,7 +190,7 @@
       </label>
     </div>
     <div class="header-center">
-      <a href="#"><div class="brand">PufferLib 0.4.3 documentation</div></a>
+      <a href="#"><div class="brand">PufferLib 0.5.0 documentation</div></a>
     </div>
     <div class="header-right">
       <div class="theme-toggle-container theme-toggle-header">
@@ -213,7 +213,7 @@
       <div class="sidebar-sticky"><a class="sidebar-brand" href="#">
   
   
-  <span class="sidebar-brand-text">PufferLib 0.4.3 documentation</span>
+  <span class="sidebar-brand-text">PufferLib 0.5.0 documentation</span>
   
 </a><form class="sidebar-search-container" method="get" action="search.html" role="search">
   <input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
@@ -231,15 +231,17 @@
 <p class="caption" role="heading"><span class="caption-text">API</span></p>
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="rst/api.html">Emulation</a></li>
-<li class="toctree-l1"><a class="reference internal" href="rst/api.html#registry">Registry</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rst/api.html#environments">Environments</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/api.html#models">Models</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/api.html#vectorization">Vectorization</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/api.html#cleanrl-integration">CleanRL Integration</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/api.html#rllib-binding">RLlib Binding</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rst/api.html#sb3-binding">SB3 Binding</a></li>
 </ul>
 <p class="caption" role="heading"><span class="caption-text">Blog</span></p>
 <ul>
-<li class="toctree-l1"><a class="reference internal" href="rst/blog.html">PufferLib 0.4: Ready to Take on Bigger Fish</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rst/blog.html">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rst/blog.html#pufferlib-0-4-ready-to-take-on-bigger-fish">PufferLib 0.4: Ready to Take on Bigger Fish</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/blog.html#pufferlib-0-2-ready-to-take-on-the-big-fish">PufferLib 0.2: Ready to Take on the Big Fish</a></li>
 </ul>
 
@@ -290,29 +292,25 @@ <h1>Index<a class="headerlink" href="#index" title="Permalink to this heading">#
 <p class="caption" role="heading"><span class="caption-text">API</span></p>
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="rst/api.html">Emulation</a></li>
-<li class="toctree-l1"><a class="reference internal" href="rst/api.html#registry">Registry</a><ul>
-<li class="toctree-l2"><a class="reference internal" href="rst/api.html#atari">Atari</a></li>
-<li class="toctree-l2"><a class="reference internal" href="rst/api.html#butterfly">Butterfly</a></li>
-<li class="toctree-l2"><a class="reference internal" href="rst/api.html#classic-control">Classic Control</a></li>
-<li class="toctree-l2"><a class="reference internal" href="rst/api.html#crafter">Crafter</a></li>
-<li class="toctree-l2"><a class="reference internal" href="rst/api.html#griddly">Griddly</a></li>
-<li class="toctree-l2"><a class="reference internal" href="rst/api.html#magent">MAgent</a></li>
-<li class="toctree-l2"><a class="reference internal" href="rst/api.html#microrts">MicroRTS</a></li>
-<li class="toctree-l2"><a class="reference internal" href="rst/api.html#nethack">NetHack</a></li>
-<li class="toctree-l2"><a class="reference internal" href="rst/api.html#neural-mmo">Neural MMO</a></li>
-<li class="toctree-l2"><a class="reference internal" href="rst/api.html#procgen">Procgen</a></li>
-</ul>
-</li>
+<li class="toctree-l1"><a class="reference internal" href="rst/api.html#environments">Environments</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/api.html#models">Models</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/api.html#vectorization">Vectorization</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/api.html#cleanrl-integration">CleanRL Integration</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/api.html#rllib-binding">RLlib Binding</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rst/api.html#sb3-binding">SB3 Binding</a></li>
 </ul>
 </div>
 <div class="toctree-wrapper compound">
 <p class="caption" role="heading"><span class="caption-text">Blog</span></p>
 <ul>
-<li class="toctree-l1"><a class="reference internal" href="rst/blog.html">PufferLib 0.4: Ready to Take on Bigger Fish</a><ul>
+<li class="toctree-l1"><a class="reference internal" href="rst/blog.html">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="rst/blog.html#the-simulation-crisis">The Simulation Crisis</a></li>
+<li class="toctree-l2"><a class="reference internal" href="rst/blog.html#the-solution">The Solution</a></li>
+<li class="toctree-l2"><a class="reference internal" href="rst/blog.html#experiments">Experiments</a></li>
+<li class="toctree-l2"><a class="reference internal" href="rst/blog.html#technical-details-and-gotchas">Technical Details and Gotchas</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="rst/blog.html#pufferlib-0-4-ready-to-take-on-bigger-fish">PufferLib 0.4: Ready to Take on Bigger Fish</a><ul>
 <li class="toctree-l2"><a class="reference internal" href="rst/blog.html#emulation">Emulation</a></li>
 <li class="toctree-l2"><a class="reference internal" href="rst/blog.html#vectorization">Vectorization</a></li>
 <li class="toctree-l2"><a class="reference internal" href="rst/blog.html#puffertank">PufferTank</a></li>
diff --git a/docs/build/html/objects.inv b/docs/build/html/objects.inv
index 29bc024..168899d 100644
Binary files a/docs/build/html/objects.inv and b/docs/build/html/objects.inv differ
diff --git a/docs/build/html/rst/_static/0-5_blog_envpool.png b/docs/build/html/rst/_static/0-5_blog_envpool.png
new file mode 100644
index 0000000..46a472e
Binary files /dev/null and b/docs/build/html/rst/_static/0-5_blog_envpool.png differ
diff --git a/docs/build/html/rst/_static/0-5_blog_header.png b/docs/build/html/rst/_static/0-5_blog_header.png
new file mode 100644
index 0000000..1c49024
Binary files /dev/null and b/docs/build/html/rst/_static/0-5_blog_header.png differ
diff --git a/docs/build/html/rst/_static/documentation_options.js b/docs/build/html/rst/_static/documentation_options.js
index 7696827..0ce3c19 100644
--- a/docs/build/html/rst/_static/documentation_options.js
+++ b/docs/build/html/rst/_static/documentation_options.js
@@ -1,6 +1,6 @@
 var DOCUMENTATION_OPTIONS = {
     URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'),
-    VERSION: '0.4.3',
+    VERSION: '0.5.0',
     LANGUAGE: 'en',
     COLLAPSE_INDEX: false,
     BUILDER: 'html',
diff --git a/docs/build/html/rst/_static/pygments.css b/docs/build/html/rst/_static/pygments.css
index 9c45769..5c8cad8 100644
--- a/docs/build/html/rst/_static/pygments.css
+++ b/docs/build/html/rst/_static/pygments.css
@@ -22,6 +22,7 @@
 .highlight .cs { color: #8f5902; font-style: italic } /* Comment.Special */
 .highlight .gd { color: #a40000 } /* Generic.Deleted */
 .highlight .ge { color: #000000; font-style: italic } /* Generic.Emph */
+.highlight .ges { color: #000000; font-weight: bold; font-style: italic } /* Generic.EmphStrong */
 .highlight .gr { color: #ef2929 } /* Generic.Error */
 .highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */
 .highlight .gi { color: #00A000 } /* Generic.Inserted */
@@ -89,35 +90,36 @@ body[data-theme="dark"] .highlight td.linenos .special { color: #000000; backgro
 body[data-theme="dark"] .highlight span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; }
 body[data-theme="dark"] .highlight .hll { background-color: #49483e }
 body[data-theme="dark"] .highlight { background: #272822; color: #f8f8f2 }
-body[data-theme="dark"] .highlight .c { color: #75715e } /* Comment */
-body[data-theme="dark"] .highlight .err { color: #960050; background-color: #1e0010 } /* Error */
+body[data-theme="dark"] .highlight .c { color: #959077 } /* Comment */
+body[data-theme="dark"] .highlight .err { color: #ed007e; background-color: #1e0010 } /* Error */
 body[data-theme="dark"] .highlight .esc { color: #f8f8f2 } /* Escape */
 body[data-theme="dark"] .highlight .g { color: #f8f8f2 } /* Generic */
 body[data-theme="dark"] .highlight .k { color: #66d9ef } /* Keyword */
 body[data-theme="dark"] .highlight .l { color: #ae81ff } /* Literal */
 body[data-theme="dark"] .highlight .n { color: #f8f8f2 } /* Name */
-body[data-theme="dark"] .highlight .o { color: #f92672 } /* Operator */
+body[data-theme="dark"] .highlight .o { color: #ff4689 } /* Operator */
 body[data-theme="dark"] .highlight .x { color: #f8f8f2 } /* Other */
 body[data-theme="dark"] .highlight .p { color: #f8f8f2 } /* Punctuation */
-body[data-theme="dark"] .highlight .ch { color: #75715e } /* Comment.Hashbang */
-body[data-theme="dark"] .highlight .cm { color: #75715e } /* Comment.Multiline */
-body[data-theme="dark"] .highlight .cp { color: #75715e } /* Comment.Preproc */
-body[data-theme="dark"] .highlight .cpf { color: #75715e } /* Comment.PreprocFile */
-body[data-theme="dark"] .highlight .c1 { color: #75715e } /* Comment.Single */
-body[data-theme="dark"] .highlight .cs { color: #75715e } /* Comment.Special */
-body[data-theme="dark"] .highlight .gd { color: #f92672 } /* Generic.Deleted */
+body[data-theme="dark"] .highlight .ch { color: #959077 } /* Comment.Hashbang */
+body[data-theme="dark"] .highlight .cm { color: #959077 } /* Comment.Multiline */
+body[data-theme="dark"] .highlight .cp { color: #959077 } /* Comment.Preproc */
+body[data-theme="dark"] .highlight .cpf { color: #959077 } /* Comment.PreprocFile */
+body[data-theme="dark"] .highlight .c1 { color: #959077 } /* Comment.Single */
+body[data-theme="dark"] .highlight .cs { color: #959077 } /* Comment.Special */
+body[data-theme="dark"] .highlight .gd { color: #ff4689 } /* Generic.Deleted */
 body[data-theme="dark"] .highlight .ge { color: #f8f8f2; font-style: italic } /* Generic.Emph */
+body[data-theme="dark"] .highlight .ges { color: #f8f8f2; font-weight: bold; font-style: italic } /* Generic.EmphStrong */
 body[data-theme="dark"] .highlight .gr { color: #f8f8f2 } /* Generic.Error */
 body[data-theme="dark"] .highlight .gh { color: #f8f8f2 } /* Generic.Heading */
 body[data-theme="dark"] .highlight .gi { color: #a6e22e } /* Generic.Inserted */
 body[data-theme="dark"] .highlight .go { color: #66d9ef } /* Generic.Output */
-body[data-theme="dark"] .highlight .gp { color: #f92672; font-weight: bold } /* Generic.Prompt */
+body[data-theme="dark"] .highlight .gp { color: #ff4689; font-weight: bold } /* Generic.Prompt */
 body[data-theme="dark"] .highlight .gs { color: #f8f8f2; font-weight: bold } /* Generic.Strong */
-body[data-theme="dark"] .highlight .gu { color: #75715e } /* Generic.Subheading */
+body[data-theme="dark"] .highlight .gu { color: #959077 } /* Generic.Subheading */
 body[data-theme="dark"] .highlight .gt { color: #f8f8f2 } /* Generic.Traceback */
 body[data-theme="dark"] .highlight .kc { color: #66d9ef } /* Keyword.Constant */
 body[data-theme="dark"] .highlight .kd { color: #66d9ef } /* Keyword.Declaration */
-body[data-theme="dark"] .highlight .kn { color: #f92672 } /* Keyword.Namespace */
+body[data-theme="dark"] .highlight .kn { color: #ff4689 } /* Keyword.Namespace */
 body[data-theme="dark"] .highlight .kp { color: #66d9ef } /* Keyword.Pseudo */
 body[data-theme="dark"] .highlight .kr { color: #66d9ef } /* Keyword.Reserved */
 body[data-theme="dark"] .highlight .kt { color: #66d9ef } /* Keyword.Type */
@@ -136,9 +138,9 @@ body[data-theme="dark"] .highlight .nl { color: #f8f8f2 } /* Name.Label */
 body[data-theme="dark"] .highlight .nn { color: #f8f8f2 } /* Name.Namespace */
 body[data-theme="dark"] .highlight .nx { color: #a6e22e } /* Name.Other */
 body[data-theme="dark"] .highlight .py { color: #f8f8f2 } /* Name.Property */
-body[data-theme="dark"] .highlight .nt { color: #f92672 } /* Name.Tag */
+body[data-theme="dark"] .highlight .nt { color: #ff4689 } /* Name.Tag */
 body[data-theme="dark"] .highlight .nv { color: #f8f8f2 } /* Name.Variable */
-body[data-theme="dark"] .highlight .ow { color: #f92672 } /* Operator.Word */
+body[data-theme="dark"] .highlight .ow { color: #ff4689 } /* Operator.Word */
 body[data-theme="dark"] .highlight .pm { color: #f8f8f2 } /* Punctuation.Marker */
 body[data-theme="dark"] .highlight .w { color: #f8f8f2 } /* Text.Whitespace */
 body[data-theme="dark"] .highlight .mb { color: #ae81ff } /* Literal.Number.Bin */
@@ -174,35 +176,36 @@ body:not([data-theme="light"]) .highlight td.linenos .special { color: #000000;
 body:not([data-theme="light"]) .highlight span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; }
 body:not([data-theme="light"]) .highlight .hll { background-color: #49483e }
 body:not([data-theme="light"]) .highlight { background: #272822; color: #f8f8f2 }
-body:not([data-theme="light"]) .highlight .c { color: #75715e } /* Comment */
-body:not([data-theme="light"]) .highlight .err { color: #960050; background-color: #1e0010 } /* Error */
+body:not([data-theme="light"]) .highlight .c { color: #959077 } /* Comment */
+body:not([data-theme="light"]) .highlight .err { color: #ed007e; background-color: #1e0010 } /* Error */
 body:not([data-theme="light"]) .highlight .esc { color: #f8f8f2 } /* Escape */
 body:not([data-theme="light"]) .highlight .g { color: #f8f8f2 } /* Generic */
 body:not([data-theme="light"]) .highlight .k { color: #66d9ef } /* Keyword */
 body:not([data-theme="light"]) .highlight .l { color: #ae81ff } /* Literal */
 body:not([data-theme="light"]) .highlight .n { color: #f8f8f2 } /* Name */
-body:not([data-theme="light"]) .highlight .o { color: #f92672 } /* Operator */
+body:not([data-theme="light"]) .highlight .o { color: #ff4689 } /* Operator */
 body:not([data-theme="light"]) .highlight .x { color: #f8f8f2 } /* Other */
 body:not([data-theme="light"]) .highlight .p { color: #f8f8f2 } /* Punctuation */
-body:not([data-theme="light"]) .highlight .ch { color: #75715e } /* Comment.Hashbang */
-body:not([data-theme="light"]) .highlight .cm { color: #75715e } /* Comment.Multiline */
-body:not([data-theme="light"]) .highlight .cp { color: #75715e } /* Comment.Preproc */
-body:not([data-theme="light"]) .highlight .cpf { color: #75715e } /* Comment.PreprocFile */
-body:not([data-theme="light"]) .highlight .c1 { color: #75715e } /* Comment.Single */
-body:not([data-theme="light"]) .highlight .cs { color: #75715e } /* Comment.Special */
-body:not([data-theme="light"]) .highlight .gd { color: #f92672 } /* Generic.Deleted */
+body:not([data-theme="light"]) .highlight .ch { color: #959077 } /* Comment.Hashbang */
+body:not([data-theme="light"]) .highlight .cm { color: #959077 } /* Comment.Multiline */
+body:not([data-theme="light"]) .highlight .cp { color: #959077 } /* Comment.Preproc */
+body:not([data-theme="light"]) .highlight .cpf { color: #959077 } /* Comment.PreprocFile */
+body:not([data-theme="light"]) .highlight .c1 { color: #959077 } /* Comment.Single */
+body:not([data-theme="light"]) .highlight .cs { color: #959077 } /* Comment.Special */
+body:not([data-theme="light"]) .highlight .gd { color: #ff4689 } /* Generic.Deleted */
 body:not([data-theme="light"]) .highlight .ge { color: #f8f8f2; font-style: italic } /* Generic.Emph */
+body:not([data-theme="light"]) .highlight .ges { color: #f8f8f2; font-weight: bold; font-style: italic } /* Generic.EmphStrong */
 body:not([data-theme="light"]) .highlight .gr { color: #f8f8f2 } /* Generic.Error */
 body:not([data-theme="light"]) .highlight .gh { color: #f8f8f2 } /* Generic.Heading */
 body:not([data-theme="light"]) .highlight .gi { color: #a6e22e } /* Generic.Inserted */
 body:not([data-theme="light"]) .highlight .go { color: #66d9ef } /* Generic.Output */
-body:not([data-theme="light"]) .highlight .gp { color: #f92672; font-weight: bold } /* Generic.Prompt */
+body:not([data-theme="light"]) .highlight .gp { color: #ff4689; font-weight: bold } /* Generic.Prompt */
 body:not([data-theme="light"]) .highlight .gs { color: #f8f8f2; font-weight: bold } /* Generic.Strong */
-body:not([data-theme="light"]) .highlight .gu { color: #75715e } /* Generic.Subheading */
+body:not([data-theme="light"]) .highlight .gu { color: #959077 } /* Generic.Subheading */
 body:not([data-theme="light"]) .highlight .gt { color: #f8f8f2 } /* Generic.Traceback */
 body:not([data-theme="light"]) .highlight .kc { color: #66d9ef } /* Keyword.Constant */
 body:not([data-theme="light"]) .highlight .kd { color: #66d9ef } /* Keyword.Declaration */
-body:not([data-theme="light"]) .highlight .kn { color: #f92672 } /* Keyword.Namespace */
+body:not([data-theme="light"]) .highlight .kn { color: #ff4689 } /* Keyword.Namespace */
 body:not([data-theme="light"]) .highlight .kp { color: #66d9ef } /* Keyword.Pseudo */
 body:not([data-theme="light"]) .highlight .kr { color: #66d9ef } /* Keyword.Reserved */
 body:not([data-theme="light"]) .highlight .kt { color: #66d9ef } /* Keyword.Type */
@@ -221,9 +224,9 @@ body:not([data-theme="light"]) .highlight .nl { color: #f8f8f2 } /* Name.Label *
 body:not([data-theme="light"]) .highlight .nn { color: #f8f8f2 } /* Name.Namespace */
 body:not([data-theme="light"]) .highlight .nx { color: #a6e22e } /* Name.Other */
 body:not([data-theme="light"]) .highlight .py { color: #f8f8f2 } /* Name.Property */
-body:not([data-theme="light"]) .highlight .nt { color: #f92672 } /* Name.Tag */
+body:not([data-theme="light"]) .highlight .nt { color: #ff4689 } /* Name.Tag */
 body:not([data-theme="light"]) .highlight .nv { color: #f8f8f2 } /* Name.Variable */
-body:not([data-theme="light"]) .highlight .ow { color: #f92672 } /* Operator.Word */
+body:not([data-theme="light"]) .highlight .ow { color: #ff4689 } /* Operator.Word */
 body:not([data-theme="light"]) .highlight .pm { color: #f8f8f2 } /* Punctuation.Marker */
 body:not([data-theme="light"]) .highlight .w { color: #f8f8f2 } /* Text.Whitespace */
 body:not([data-theme="light"]) .highlight .mb { color: #ae81ff } /* Literal.Number.Bin */
diff --git a/docs/build/html/rst/api.html b/docs/build/html/rst/api.html
index 12a0de7..00ef2af 100644
--- a/docs/build/html/rst/api.html
+++ b/docs/build/html/rst/api.html
@@ -3,10 +3,10 @@
   <head><meta charset="utf-8"/>
     <meta name="viewport" content="width=device-width,initial-scale=1"/>
     <meta name="color-scheme" content="light dark"><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />
-<link rel="index" title="Index" href="../genindex.html" /><link rel="search" title="Search" href="../search.html" /><link rel="next" title="PufferLib 0.4: Ready to Take on Bigger Fish" href="blog.html" /><link rel="prev" title="Libraries" href="landing.html" />
+<link rel="index" title="Index" href="../genindex.html" /><link rel="search" title="Search" href="../search.html" /><link rel="next" title="PufferLib 0.5: A Bigger EnvPool for Growing Puffers" href="blog.html" /><link rel="prev" title="Libraries" href="landing.html" />
 
     <!-- Generated with Sphinx 5.0.0 and Furo 2023.03.27 -->
-        <title>Emulation - PufferLib 0.4.3 documentation</title>
+        <title>Emulation - PufferLib 0.5.0 documentation</title>
       <link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
     <link rel="stylesheet" type="text/css" href="../_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
     <link rel="stylesheet" type="text/css" href="../_static/design-style.1e8bd061cd6da7fc9cf755528e8ffc24.min.css" />
@@ -190,7 +190,7 @@
       </label>
     </div>
     <div class="header-center">
-      <a href="../index.html"><div class="brand">PufferLib 0.4.3 documentation</div></a>
+      <a href="../index.html"><div class="brand">PufferLib 0.5.0 documentation</div></a>
     </div>
     <div class="header-right">
       <div class="theme-toggle-container theme-toggle-header">
@@ -213,7 +213,7 @@
       <div class="sidebar-sticky"><a class="sidebar-brand" href="../index.html">
   
   
-  <span class="sidebar-brand-text">PufferLib 0.4.3 documentation</span>
+  <span class="sidebar-brand-text">PufferLib 0.5.0 documentation</span>
   
 </a><form class="sidebar-search-container" method="get" action="../search.html" role="search">
   <input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
@@ -231,15 +231,17 @@
 <p class="caption" role="heading"><span class="caption-text">API</span></p>
 <ul class="current">
 <li class="toctree-l1 current current-page"><a class="current reference internal" href="#">Emulation</a></li>
-<li class="toctree-l1"><a class="reference internal" href="#registry">Registry</a></li>
+<li class="toctree-l1"><a class="reference internal" href="#environments">Environments</a></li>
 <li class="toctree-l1"><a class="reference internal" href="#models">Models</a></li>
 <li class="toctree-l1"><a class="reference internal" href="#vectorization">Vectorization</a></li>
 <li class="toctree-l1"><a class="reference internal" href="#cleanrl-integration">CleanRL Integration</a></li>
 <li class="toctree-l1"><a class="reference internal" href="#rllib-binding">RLlib Binding</a></li>
+<li class="toctree-l1"><a class="reference internal" href="#sb3-binding">SB3 Binding</a></li>
 </ul>
 <p class="caption" role="heading"><span class="caption-text">Blog</span></p>
 <ul>
-<li class="toctree-l1"><a class="reference internal" href="blog.html">PufferLib 0.4: Ready to Take on Bigger Fish</a></li>
+<li class="toctree-l1"><a class="reference internal" href="blog.html">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a></li>
+<li class="toctree-l1"><a class="reference internal" href="blog.html#pufferlib-0-4-ready-to-take-on-bigger-fish">PufferLib 0.4: Ready to Take on Bigger Fish</a></li>
 <li class="toctree-l1"><a class="reference internal" href="blog.html#pufferlib-0-2-ready-to-take-on-the-big-fish">PufferLib 0.2: Ready to Take on the Big Fish</a></li>
 </ul>
 
@@ -285,7 +287,7 @@ <h1>Emulation<a class="headerlink" href="#emulation" title="Permalink to this he
 <p>Wrap your environments for broad compatibility. Supports passing creator functions, classes, or env objects. The API of the returned PufferEnv is the same as Gym/PettingZoo.</p>
 <dl class="py class">
 <dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.emulation.</span></span><span class="sig-name descname"><span class="pre">GymPufferEnv</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">env=None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_creator=None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_args=[]</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_kwargs={}</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">postprocessor_cls=&lt;class</span> <span class="pre">'pufferlib.emulation.Postprocessor'&gt;</span></span></em><span class="sig-paren">)</span></dt>
+<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.emulation.</span></span><span class="sig-name descname"><span class="pre">GymnasiumPufferEnv</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">env=None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_creator=None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_args=[]</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_kwargs={}</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">postprocessor_cls=&lt;class</span> <span class="pre">'pufferlib.emulation.BasicPostprocessor'&gt;</span></span></em><span class="sig-paren">)</span></dt>
 <dd><dl class="py property">
 <dt class="sig sig-object py">
 <em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">observation_space</span></span></dt>
@@ -330,14 +332,17 @@ <h1>Emulation<a class="headerlink" href="#emulation" title="Permalink to this he
 <span class="sig-name descname"><span class="pre">reset</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">seed</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
 <dd><p>Resets the environment to an initial state and returns an initial
 observation.</p>
-<p>Note that this function should not reset the environment’s random
-number generator(s); random variables in the environment’s state should
-be sampled independently between multiple calls to <cite>reset()</cite>. In other
-words, each call of <cite>reset()</cite> should yield an environment suitable for
-a new episode, independent of previous episodes.</p>
+<p>This method should also reset the environment’s random number
+generator(s) if <cite>seed</cite> is an integer or if the environment has not
+yet initialized a random number generator. If the environment already
+has a random number generator and <cite>reset</cite> is called with <cite>seed=None</cite>,
+the RNG should not be reset.
+Moreover, <cite>reset</cite> should (in the typical use case) be called with an
+integer seed right after initialization and then never again.</p>
 <dl class="field-list simple">
 <dt class="field-odd">Returns<span class="colon">:</span></dt>
-<dd class="field-odd"><p>the initial observation.</p>
+<dd class="field-odd"><p>the initial observation.
+info (optional dictionary): a dictionary containing extra information, this is only returned if return_info is set to true</p>
 </dd>
 <dt class="field-even">Return type<span class="colon">:</span></dt>
 <dd class="field-even"><p>observation (object)</p>
@@ -353,556 +358,80 @@ <h1>Emulation<a class="headerlink" href="#emulation" title="Permalink to this he
 
 <dl class="py method">
 <dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">close</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
-<dd><p>Override close in your subclass to perform any necessary cleanup.</p>
-<p>Environments will automatically close() themselves when
-garbage collected or when the program exits.</p>
-</dd></dl>
-
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">unpack_batched_obs</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">batched_obs</span></span></em><span class="sig-paren">)</span></dt>
-<dd></dd></dl>
-
-</dd></dl>
-
-<dl class="py class">
-<dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.emulation.</span></span><span class="sig-name descname"><span class="pre">PettingZooPufferEnv</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">env=None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_creator=None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_args=[]</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_kwargs={}</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">postprocessor_cls=&lt;class</span> <span class="pre">'pufferlib.emulation.Postprocessor'&gt;</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">postprocessor_kwargs={}</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">teams=None</span></span></em><span class="sig-paren">)</span></dt>
-<dd><dl class="py property">
-<dt class="sig sig-object py">
-<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">agents</span></span></dt>
-<dd></dd></dl>
-
-<dl class="py property">
-<dt class="sig sig-object py">
-<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">done</span></span></dt>
-<dd></dd></dl>
-
-<dl class="py property">
-<dt class="sig sig-object py">
-<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">single_observation_space</span></span></dt>
-<dd></dd></dl>
-
-<dl class="py property">
-<dt class="sig sig-object py">
-<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">single_action_space</span></span></dt>
-<dd></dd></dl>
-
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">observation_space</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">agent</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Returns the observation space for a single agent</p>
-</dd></dl>
-
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">action_space</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">agent</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Returns the action space for a single agent</p>
-</dd></dl>
-
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">reset</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">seed</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
-<dd></dd></dl>
-
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">step</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">actions</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Step the environment and return (observations, rewards, dones, infos)</p>
-</dd></dl>
-
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">close</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
-<dd></dd></dl>
-
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">unpack_batched_obs</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">batched_obs</span></span></em><span class="sig-paren">)</span></dt>
-<dd></dd></dl>
-
-</dd></dl>
-
-</section>
-<section id="registry">
-<h1>Registry<a class="headerlink" href="#registry" title="Permalink to this heading">#</a></h1>
-<p>make_env functions and policies for included environments.</p>
-<section id="atari">
-<h2>Atari<a class="headerlink" href="#atari" title="Permalink to this heading">#</a></h2>
-<dl class="py class">
-<dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.registry.atari.</span></span><span class="sig-name descname"><span class="pre">NoopResetEnv</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">env</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Env</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">noop_max</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">30</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Sample initial states by taking random number of no-ops on reset.
-No-op is assumed to be action 0.</p>
-<dl class="field-list simple">
-<dt class="field-odd">Parameters<span class="colon">:</span></dt>
-<dd class="field-odd"><ul class="simple">
-<li><p><strong>env</strong> – the environment to wrap</p></li>
-<li><p><strong>noop_max</strong> – the maximum value of no-ops to run</p></li>
+<span class="sig-name descname"><span class="pre">render</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
+<dd><p>Renders the environment.</p>
+<p>The set of supported modes varies per environment. (And some
+third-party environments may not support rendering at all.)
+By convention, if mode is:</p>
+<ul class="simple">
+<li><p>human: render to the current display or terminal and
+return nothing. Usually for human consumption.</p></li>
+<li><p>rgb_array: Return an numpy.ndarray with shape (x, y, 3),
+representing RGB values for an x-by-y pixel image, suitable
+for turning into a video.</p></li>
+<li><p>ansi: Return a string (str) or StringIO.StringIO containing a
+terminal-style text representation. The text can include newlines
+and ANSI escape sequences (e.g. for colors).</p></li>
 </ul>
-</dd>
-</dl>
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">reset</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#x2192;</span> <span class="sig-return-typehint"><span class="pre">ndarray</span></span></span></dt>
-<dd><p>Resets the environment to an initial state and returns an initial
-observation.</p>
-<p>Note that this function should not reset the environment’s random
-number generator(s); random variables in the environment’s state should
-be sampled independently between multiple calls to <cite>reset()</cite>. In other
-words, each call of <cite>reset()</cite> should yield an environment suitable for
-a new episode, independent of previous episodes.</p>
-<dl class="field-list simple">
-<dt class="field-odd">Returns<span class="colon">:</span></dt>
-<dd class="field-odd"><p>the initial observation.</p>
-</dd>
-<dt class="field-even">Return type<span class="colon">:</span></dt>
-<dd class="field-even"><p>observation (object)</p>
-</dd>
-</dl>
-</dd></dl>
-
-</dd></dl>
-
-<dl class="py class">
-<dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.registry.atari.</span></span><span class="sig-name descname"><span class="pre">AtariFeaturizer</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">env</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">is_multiagent</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">agent_id</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Postprocessors provide full access to the environment</p>
-<p>This means you can use them to cheat. Don’t blame us if you do.</p>
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">reset</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">obs</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Called at the beginning of each episode</p>
-</dd></dl>
-
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">observation</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">obs</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Called on each observation after it is returned by the environment</p>
-<p>You must override Postprocessor.observation_space if this function
-changes the structure of observations.</p>
-</dd></dl>
-
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">reward_done_info</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">reward</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">done</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">info</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Called on the reward, done, and info after they are returned by the environment</p>
-</dd></dl>
-
-</dd></dl>
-
-<dl class="py function">
-<dt class="sig sig-object py">
-<span class="sig-prename descclassname"><span class="pre">pufferlib.registry.atari.</span></span><span class="sig-name descname"><span class="pre">make_env</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">name</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">framestack</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Atari creation function with default CleanRL preprocessing based on Stable Baselines3 wrappers</p>
-</dd></dl>
-
-</section>
-<section id="butterfly">
-<h2>Butterfly<a class="headerlink" href="#butterfly" title="Permalink to this heading">#</a></h2>
-<dl class="py class">
-<dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.registry.butterfly.</span></span><span class="sig-name descname"><span class="pre">Policy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>The CleanRL default Atari policy: a stack of three convolutions followed by a linear layer</p>
-<p>Takes framestack as a mandatory keyword arguments. Suggested default is 1 frame
-with LSTM or 4 frames without.</p>
-</dd></dl>
-
-<dl class="py function">
-<dt class="sig sig-object py">
-<span class="sig-prename descclassname"><span class="pre">pufferlib.registry.butterfly.</span></span><span class="sig-name descname"><span class="pre">make_knights_archers_zombies_v10</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
-<dd><p>Knights Archers Zombies creation function</p>
-<p>Not yet supported: requires heterogeneous observations’’</p>
-</dd></dl>
-
-<dl class="py function">
-<dt class="sig sig-object py">
-<span class="sig-prename descclassname"><span class="pre">pufferlib.registry.butterfly.</span></span><span class="sig-name descname"><span class="pre">make_cooperative_pong_v5</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
-<dd><p>Cooperative Pong creation function</p>
-</dd></dl>
-
-</section>
-<section id="classic-control">
-<h2>Classic Control<a class="headerlink" href="#classic-control" title="Permalink to this heading">#</a></h2>
-<dl class="py function">
-<dt class="sig sig-object py">
-<span class="sig-prename descclassname"><span class="pre">pufferlib.registry.classic_control.</span></span><span class="sig-name descname"><span class="pre">make_cartpole_env</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
-<dd><p>CartPole creation function</p>
-<p>This env is a useful test because it works without
-any additional dependencies</p>
-</dd></dl>
-
-</section>
-<section id="crafter">
-<h2>Crafter<a class="headerlink" href="#crafter" title="Permalink to this heading">#</a></h2>
-<dl class="py class">
-<dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.registry.crafter.</span></span><span class="sig-name descname"><span class="pre">Policy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>The CleanRL default Atari policy: a stack of three convolutions followed by a linear layer</p>
-<p>Takes framestack as a mandatory keyword arguments. Suggested default is 1 frame
-with LSTM or 4 frames without.</p>
-</dd></dl>
-
-<dl class="py class">
-<dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.registry.crafter.</span></span><span class="sig-name descname"><span class="pre">CrafterPostprocessor</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">env</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">is_multiagent</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">agent_id</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Postprocessors provide full access to the environment</p>
-<p>This means you can use them to cheat. Don’t blame us if you do.</p>
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">features</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">obs</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">step</span></span></em><span class="sig-paren">)</span></dt>
-<dd></dd></dl>
-
-</dd></dl>
-
-<dl class="py function">
-<dt class="sig sig-object py">
-<span class="sig-prename descclassname"><span class="pre">pufferlib.registry.crafter.</span></span><span class="sig-name descname"><span class="pre">make_env</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
-<dd><p>Crafter creation function</p>
-</dd></dl>
-
-</section>
-<section id="griddly">
-<h2>Griddly<a class="headerlink" href="#griddly" title="Permalink to this heading">#</a></h2>
-<dl class="py class">
-<dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.registry.griddly.</span></span><span class="sig-name descname"><span class="pre">GriddlyGymPufferEnv</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">env_creator</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_args</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">[]</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_kwargs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">{}</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Griddly envs need to be reset in order to define their obs space</p>
-</dd></dl>
-
-<dl class="py function">
-<dt class="sig sig-object py">
-<span class="sig-prename descclassname"><span class="pre">pufferlib.registry.griddly.</span></span><span class="sig-name descname"><span class="pre">make_spider_v0_env</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
-<dd><p>Griddly Spiders binding creation function</p>
-<p>Support for Griddly is WIP because environments do not specify
-their observation spaces until after they are created.</p>
-</dd></dl>
-
-</section>
-<section id="magent">
-<h2>MAgent<a class="headerlink" href="#magent" title="Permalink to this heading">#</a></h2>
-<dl class="py class">
-<dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.registry.magent.</span></span><span class="sig-name descname"><span class="pre">Policy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Based off of the DQN policy in MAgent</p>
-<p>The CleanRL default Atari policy: a stack of three convolutions followed by a linear layer</p>
-<p>Takes framestack as a mandatory keyword arguments. Suggested default is 1 frame
-with LSTM or 4 frames without.</p>
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">critic</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">hidden</span></span></em><span class="sig-paren">)</span></dt>
-<dd></dd></dl>
-
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">encode_observations</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">observations</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Encodes a batch of observations into hidden states</p>
-<p>Call pufferlib.emulation.unpack_batched_obs at the start of this
-function to unflatten observations to their original structured form:</p>
-<dl class="simple">
-<dt>observations = pufferlib.emulation.unpack_batched_obs(</dt><dd><p>self.envs.structured_observation_space, env_outputs)</p>
-</dd>
-</dl>
-<dl class="field-list simple">
-<dt class="field-odd">Parameters<span class="colon">:</span></dt>
-<dd class="field-odd"><p><strong>flat_observations</strong> – A tensor of shape (batch, …, obs_size)</p>
-</dd>
-<dt class="field-even">Returns<span class="colon">:</span></dt>
-<dd class="field-even"><p>Tensor of (batch, …, hidden_size)
-lookup: Tensor of (batch, …) that can be used to return additional embeddings</p>
-</dd>
-<dt class="field-odd">Return type<span class="colon">:</span></dt>
-<dd class="field-odd"><p>hidden</p>
-</dd>
-</dl>
-</dd></dl>
-
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">decode_actions</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">hidden</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">lookup</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Decodes a batch of hidden states into multidiscrete actions</p>
-<dl class="field-list simple">
-<dt class="field-odd">Parameters<span class="colon">:</span></dt>
-<dd class="field-odd"><ul class="simple">
-<li><p><strong>flat_hidden</strong> – Tensor of (batch, …, hidden_size)</p></li>
-<li><p><strong>lookup</strong> – Tensor of (batch, …), if returned by encode_observations</p></li>
-</ul>
-</dd>
-<dt class="field-even">Returns<span class="colon">:</span></dt>
-<dd class="field-even"><p>Tensor of (batch, …, action_size)
-value: Tensor of (batch, …)</p>
-</dd>
-<dt class="field-odd">Return type<span class="colon">:</span></dt>
-<dd class="field-odd"><p>actions</p>
-</dd>
-</dl>
-<p>actions is a concatenated tensor of logits for each action space dimension.
-It should be of shape (batch, …, sum(action_space.nvec))</p>
-</dd></dl>
-
-</dd></dl>
-
-<dl class="py function">
-<dt class="sig sig-object py">
-<span class="sig-prename descclassname"><span class="pre">pufferlib.registry.magent.</span></span><span class="sig-name descname"><span class="pre">make_battle_v4_env</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
-<dd><p>MAgent Battle creation function</p>
-</dd></dl>
-
-</section>
-<section id="microrts">
-<h2>MicroRTS<a class="headerlink" href="#microrts" title="Permalink to this heading">#</a></h2>
-<dl class="py function">
-<dt class="sig sig-object py">
-<span class="sig-prename descclassname"><span class="pre">pufferlib.registry.microrts.</span></span><span class="sig-name descname"><span class="pre">make_env</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
-<dd><p>Gym MicroRTS creation function</p>
-<p>This library appears broken. Step crashes in Java.</p>
-</dd></dl>
-
-</section>
-<section id="nethack">
-<h2>NetHack<a class="headerlink" href="#nethack" title="Permalink to this heading">#</a></h2>
-<dl class="py function">
-<dt class="sig sig-object py">
-<span class="sig-prename descclassname"><span class="pre">pufferlib.registry.nethack.</span></span><span class="sig-name descname"><span class="pre">make_env</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
-<dd><p>NetHack binding creation function</p>
-</dd></dl>
-
-<dl class="py class">
-<dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.registry.nethack.</span></span><span class="sig-name descname"><span class="pre">Policy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Default NetHack Learning Environment policy ported from the nle release</p>
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">encode_observations</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">env_outputs</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Encodes a batch of observations into hidden states</p>
-<p>Call pufferlib.emulation.unpack_batched_obs at the start of this
-function to unflatten observations to their original structured form:</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
 <dl class="simple">
-<dt>observations = pufferlib.emulation.unpack_batched_obs(</dt><dd><p>self.envs.structured_observation_space, env_outputs)</p>
+<dt>Make sure that your class’s metadata ‘render_modes’ key includes</dt><dd><p>the list of supported modes. It’s recommended to call super()
+in implementations to use the functionality of this method.</p>
 </dd>
 </dl>
+</div>
 <dl class="field-list simple">
 <dt class="field-odd">Parameters<span class="colon">:</span></dt>
-<dd class="field-odd"><p><strong>flat_observations</strong> – A tensor of shape (batch, …, obs_size)</p>
-</dd>
-<dt class="field-even">Returns<span class="colon">:</span></dt>
-<dd class="field-even"><p>Tensor of (batch, …, hidden_size)
-lookup: Tensor of (batch, …) that can be used to return additional embeddings</p>
-</dd>
-<dt class="field-odd">Return type<span class="colon">:</span></dt>
-<dd class="field-odd"><p>hidden</p>
-</dd>
-</dl>
-</dd></dl>
-
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">decode_actions</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">hidden</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">lookup</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">concat</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Decodes a batch of hidden states into multidiscrete actions</p>
-<dl class="field-list simple">
-<dt class="field-odd">Parameters<span class="colon">:</span></dt>
-<dd class="field-odd"><ul class="simple">
-<li><p><strong>flat_hidden</strong> – Tensor of (batch, …, hidden_size)</p></li>
-<li><p><strong>lookup</strong> – Tensor of (batch, …), if returned by encode_observations</p></li>
-</ul>
-</dd>
-<dt class="field-even">Returns<span class="colon">:</span></dt>
-<dd class="field-even"><p>Tensor of (batch, …, action_size)
-value: Tensor of (batch, …)</p>
-</dd>
-<dt class="field-odd">Return type<span class="colon">:</span></dt>
-<dd class="field-odd"><p>actions</p>
+<dd class="field-odd"><p><strong>mode</strong> (<em>str</em>) – the mode to render with</p>
 </dd>
 </dl>
-<p>actions is a concatenated tensor of logits for each action space dimension.
-It should be of shape (batch, …, sum(action_space.nvec))</p>
-</dd></dl>
-
-</dd></dl>
-
-<dl class="py class">
-<dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.registry.nethack.</span></span><span class="sig-name descname"><span class="pre">Crop</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">height</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">width</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">height_target</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">width_target</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Helper class for NetHackNet below.</p>
-<p>Initializes internal Module state, shared by both nn.Module and ScriptModule.</p>
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">forward</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">inputs</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">coordinates</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Calculates centered crop around given x,y coordinates.
-Args:
-inputs [B x H x W]
-coordinates [B x 2] x,y coordinates
-Returns:
-[B x H’ x W’] inputs cropped and centered around x,y coordinates.</p>
-</dd></dl>
-
-<dl class="py attribute">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">training</span></span><em class="property"><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="pre">bool</span></em></dt>
-<dd></dd></dl>
-
-</dd></dl>
-
-</section>
-<section id="neural-mmo">
-<h2>Neural MMO<a class="headerlink" href="#neural-mmo" title="Permalink to this heading">#</a></h2>
-<dl class="py function">
-<dt class="sig sig-object py">
-<span class="sig-prename descclassname"><span class="pre">pufferlib.registry.nmmo.</span></span><span class="sig-name descname"><span class="pre">make_env</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Neural MMO creation function</p>
-</dd></dl>
-
-<dl class="py class">
-<dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.registry.nmmo.</span></span><span class="sig-name descname"><span class="pre">Policy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span></dt>
-<dd><dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">encode_observations</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">env_outputs</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Encodes a batch of observations into hidden states</p>
-<p>Call pufferlib.emulation.unpack_batched_obs at the start of this
-function to unflatten observations to their original structured form:</p>
+<p>Example:</p>
+<dl>
+<dt>class MyEnv(Env):</dt><dd><p>metadata = {‘render_modes’: [‘human’, ‘rgb_array’]}</p>
 <dl class="simple">
-<dt>observations = pufferlib.emulation.unpack_batched_obs(</dt><dd><p>self.envs.structured_observation_space, env_outputs)</p>
+<dt>def render(self, mode=’human’):</dt><dd><dl class="simple">
+<dt>if mode == ‘rgb_array’:</dt><dd><p>return np.array(…) # return RGB frame suitable for video</p>
 </dd>
-</dl>
-<dl class="field-list simple">
-<dt class="field-odd">Parameters<span class="colon">:</span></dt>
-<dd class="field-odd"><p><strong>flat_observations</strong> – A tensor of shape (batch, …, obs_size)</p>
+<dt>elif mode == ‘human’:</dt><dd><p>… # pop up a window and render</p>
 </dd>
-<dt class="field-even">Returns<span class="colon">:</span></dt>
-<dd class="field-even"><p>Tensor of (batch, …, hidden_size)
-lookup: Tensor of (batch, …) that can be used to return additional embeddings</p>
-</dd>
-<dt class="field-odd">Return type<span class="colon">:</span></dt>
-<dd class="field-odd"><p>hidden</p>
-</dd>
-</dl>
-</dd></dl>
-
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">decode_actions</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">hidden</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">lookup</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">concat</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Decodes a batch of hidden states into multidiscrete actions</p>
-<dl class="field-list simple">
-<dt class="field-odd">Parameters<span class="colon">:</span></dt>
-<dd class="field-odd"><ul class="simple">
-<li><p><strong>flat_hidden</strong> – Tensor of (batch, …, hidden_size)</p></li>
-<li><p><strong>lookup</strong> – Tensor of (batch, …), if returned by encode_observations</p></li>
-</ul>
-</dd>
-<dt class="field-even">Returns<span class="colon">:</span></dt>
-<dd class="field-even"><p>Tensor of (batch, …, action_size)
-value: Tensor of (batch, …)</p>
-</dd>
-<dt class="field-odd">Return type<span class="colon">:</span></dt>
-<dd class="field-odd"><p>actions</p>
-</dd>
-</dl>
-<p>actions is a concatenated tensor of logits for each action space dimension.
-It should be of shape (batch, …, sum(action_space.nvec))</p>
-</dd></dl>
-
-</dd></dl>
-
-</section>
-<section id="procgen">
-<h2>Procgen<a class="headerlink" href="#procgen" title="Permalink to this heading">#</a></h2>
-<dl class="py class">
-<dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.registry.procgen.</span></span><span class="sig-name descname"><span class="pre">ResidualBlock</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">channels</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Initializes internal Module state, shared by both nn.Module and ScriptModule.</p>
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">forward</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Defines the computation performed at every call.</p>
-<p>Should be overridden by all subclasses.</p>
-<div class="admonition note">
-<p class="admonition-title">Note</p>
-<p>Although the recipe for forward pass needs to be defined within
-this function, one should call the <code class="xref py py-class docutils literal notranslate"><span class="pre">Module</span></code> instance afterwards
-instead of this since the former takes care of running the
-registered hooks while the latter silently ignores them.</p>
-</div>
-</dd></dl>
-
-<dl class="py attribute">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">training</span></span><em class="property"><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="pre">bool</span></em></dt>
-<dd></dd></dl>
-
-</dd></dl>
-
-<dl class="py class">
-<dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.registry.procgen.</span></span><span class="sig-name descname"><span class="pre">ConvSequence</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">input_shape</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">out_channels</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Initializes internal Module state, shared by both nn.Module and ScriptModule.</p>
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">forward</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Defines the computation performed at every call.</p>
-<p>Should be overridden by all subclasses.</p>
-<div class="admonition note">
-<p class="admonition-title">Note</p>
-<p>Although the recipe for forward pass needs to be defined within
-this function, one should call the <code class="xref py py-class docutils literal notranslate"><span class="pre">Module</span></code> instance afterwards
-instead of this since the former takes care of running the
-registered hooks while the latter silently ignores them.</p>
-</div>
-</dd></dl>
-
-<dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">get_output_shape</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
-<dd></dd></dl>
-
-<dl class="py attribute">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">training</span></span><em class="property"><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="pre">bool</span></em></dt>
-<dd></dd></dl>
-
-</dd></dl>
-
-<dl class="py class">
-<dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.registry.procgen.</span></span><span class="sig-name descname"><span class="pre">Policy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span></dt>
-<dd><dl class="py method">
-<dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">encode_observations</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Encodes a batch of observations into hidden states</p>
-<p>Call pufferlib.emulation.unpack_batched_obs at the start of this
-function to unflatten observations to their original structured form:</p>
-<dl class="simple">
-<dt>observations = pufferlib.emulation.unpack_batched_obs(</dt><dd><p>self.envs.structured_observation_space, env_outputs)</p>
+<dt>else:</dt><dd><p>super(MyEnv, self).render(mode=mode) # just raise an exception</p>
 </dd>
 </dl>
-<dl class="field-list simple">
-<dt class="field-odd">Parameters<span class="colon">:</span></dt>
-<dd class="field-odd"><p><strong>flat_observations</strong> – A tensor of shape (batch, …, obs_size)</p>
-</dd>
-<dt class="field-even">Returns<span class="colon">:</span></dt>
-<dd class="field-even"><p>Tensor of (batch, …, hidden_size)
-lookup: Tensor of (batch, …) that can be used to return additional embeddings</p>
 </dd>
-<dt class="field-odd">Return type<span class="colon">:</span></dt>
-<dd class="field-odd"><p>hidden</p>
+</dl>
 </dd>
 </dl>
 </dd></dl>
 
 <dl class="py method">
 <dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">decode_actions</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">hidden</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">lookup</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>linear decoder function</p>
+<span class="sig-name descname"><span class="pre">close</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
+<dd><p>Override close in your subclass to perform any necessary cleanup.</p>
+<p>Environments will automatically close() themselves when
+garbage collected or when the program exits.</p>
 </dd></dl>
 
+<dl class="py method">
+<dt class="sig sig-object py">
+<span class="sig-name descname"><span class="pre">unpack_batched_obs</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">batched_obs</span></span></em><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
+
 </dd></dl>
 
 <dl class="py class">
 <dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.registry.procgen.</span></span><span class="sig-name descname"><span class="pre">ProcgenVecEnv</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">env_name</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_envs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_levels</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">start_level</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">distribution_mode</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'easy'</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>WIP Vectorized Procgen environment wrapper</p>
-<p>Does not use normal PufferLib emulation</p>
+<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.emulation.</span></span><span class="sig-name descname"><span class="pre">PettingZooPufferEnv</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">env=None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_creator=None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_args=[]</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_kwargs={}</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">postprocessor_cls=&lt;class</span> <span class="pre">'pufferlib.emulation.Postprocessor'&gt;</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">postprocessor_kwargs={}</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">teams=None</span></span></em><span class="sig-paren">)</span></dt>
+<dd><dl class="py property">
+<dt class="sig sig-object py">
+<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">agents</span></span></dt>
+<dd></dd></dl>
+
+<dl class="py property">
+<dt class="sig sig-object py">
+<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">done</span></span></dt>
+<dd></dd></dl>
+
 <dl class="py property">
 <dt class="sig sig-object py">
 <em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">single_observation_space</span></span></dt>
@@ -913,6 +442,18 @@ <h2>Procgen<a class="headerlink" href="#procgen" title="Permalink to this headin
 <em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">single_action_space</span></span></dt>
 <dd></dd></dl>
 
+<dl class="py method">
+<dt class="sig sig-object py">
+<span class="sig-name descname"><span class="pre">observation_space</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">agent</span></span></em><span class="sig-paren">)</span></dt>
+<dd><p>Returns the observation space for a single agent</p>
+</dd></dl>
+
+<dl class="py method">
+<dt class="sig sig-object py">
+<span class="sig-name descname"><span class="pre">action_space</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">agent</span></span></em><span class="sig-paren">)</span></dt>
+<dd><p>Returns the action space for a single agent</p>
+</dd></dl>
+
 <dl class="py method">
 <dt class="sig sig-object py">
 <span class="sig-name descname"><span class="pre">reset</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">seed</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
@@ -921,39 +462,48 @@ <h2>Procgen<a class="headerlink" href="#procgen" title="Permalink to this headin
 <dl class="py method">
 <dt class="sig sig-object py">
 <span class="sig-name descname"><span class="pre">step</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">actions</span></span></em><span class="sig-paren">)</span></dt>
-<dd></dd></dl>
-
+<dd><p>Step the environment and return (observations, rewards, dones, infos)</p>
 </dd></dl>
 
-<dl class="py class">
+<dl class="py method">
 <dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.registry.procgen.</span></span><span class="sig-name descname"><span class="pre">ProcgenPostprocessor</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">env</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">is_multiagent</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">agent_id</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Postprocessors provide full access to the environment</p>
-<p>This means you can use them to cheat. Don’t blame us if you do.</p>
+<span class="sig-name descname"><span class="pre">render</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
+
 <dl class="py method">
 <dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">features</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">obs</span></span></em><span class="sig-paren">)</span></dt>
+<span class="sig-name descname"><span class="pre">close</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
 <dd></dd></dl>
 
 <dl class="py method">
 <dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">reward_done_info</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">reward</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">done</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">info</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Called on the reward, done, and info after they are returned by the environment</p>
-</dd></dl>
+<span class="sig-name descname"><span class="pre">unpack_batched_obs</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">batched_obs</span></span></em><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
 
 </dd></dl>
 
+</section>
+<section id="environments">
+<h1>Environments<a class="headerlink" href="#environments" title="Permalink to this heading">#</a></h1>
+<p>All included environments expose make_env and env_creator functions. make_env is the one that you want most of the time. The other one is used to expose e.g. class interfaces for environments that support them so that you can pass around static references.</p>
+<p>Additionally, all environments expose a Policy class with a baseline model. Note that not all environments have <em>custom</em> policies, and the default simply flattens observations before applying a linear layer. Atari, Procgen, Neural MMO, Nethack/Minihack, and Pokemon Red currently have reasonable policies.</p>
+<p>The PufferLib Squared environment is used as an example below. Everything is exposed through __init__, so you can call these methods through e.g. pufferlib.environments.squared.make_env</p>
 <dl class="py function">
 <dt class="sig sig-object py">
-<span class="sig-prename descclassname"><span class="pre">pufferlib.registry.procgen.</span></span><span class="sig-name descname"><span class="pre">make_env</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">name</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Atari creation function with default CleanRL preprocessing based on Stable Baselines3 wrappers</p>
+<span class="sig-prename descclassname"><span class="pre">pufferlib.environments.squared.environment.</span></span><span class="sig-name descname"><span class="pre">make_env</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">distance_to_target</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">3</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_targets</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em><span class="sig-paren">)</span></dt>
+<dd><p>Puffer Squared environment</p>
+</dd></dl>
+
+<dl class="py attribute">
+<dt class="sig sig-object py">
+<span class="sig-prename descclassname"><span class="pre">pufferlib.environments.squared.torch.</span></span><span class="sig-name descname"><span class="pre">Policy</span></span></dt>
+<dd><p>alias of <code class="xref py py-class docutils literal notranslate"><span class="pre">Default</span></code></p>
 </dd></dl>
 
-</section>
 </section>
 <section id="models">
 <h1>Models<a class="headerlink" href="#models" title="Permalink to this heading">#</a></h1>
-<p>PufferLib model API and default policies</p>
+<p>PufferLib model default policies and optional API. These are not required to use PufferLib.</p>
 <dl class="py class">
 <dt class="sig sig-object py">
 <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.models.</span></span><span class="sig-name descname"><span class="pre">Policy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span></dt>
@@ -1126,31 +676,64 @@ <h1>Vectorization<a class="headerlink" href="#vectorization" title="Permalink to
 <p>Distributed backends for PufferLib-wrapped environments</p>
 <dl class="py class">
 <dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.vectorization.</span></span><span class="sig-name descname"><span class="pre">Serial</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">env_creator</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_args</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">[]</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_kwargs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">{}</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_workers</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">envs_per_worker</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em><span class="sig-paren">)</span></dt>
+<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.vectorization.</span></span><span class="sig-name descname"><span class="pre">Serial</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">env_creator</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">callable</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_args</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">list</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">[]</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_kwargs</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">dict</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">{}</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_envs</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">envs_per_worker</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">envs_per_batch</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">int</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_pool</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span></dt>
 <dd><p>Runs environments in serial on the main process</p>
 <p>Use this vectorization module for debugging environments</p>
+<dl class="py property">
+<dt class="sig sig-object py">
+<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">single_observation_space</span></span></dt>
+<dd></dd></dl>
+
+<dl class="py property">
+<dt class="sig sig-object py">
+<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">single_action_space</span></span></dt>
+<dd></dd></dl>
+
+<dl class="py property">
+<dt class="sig sig-object py">
+<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">structured_observation_space</span></span></dt>
+<dd></dd></dl>
+
+<dl class="py property">
+<dt class="sig sig-object py">
+<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">flat_observation_space</span></span></dt>
+<dd></dd></dl>
+
 <dl class="py method">
 <dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">put</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span></dt>
+<span class="sig-name descname"><span class="pre">unpack_batched_obs</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">obs</span></span></em><span class="sig-paren">)</span></dt>
 <dd></dd></dl>
 
 <dl class="py method">
 <dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">get</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span></dt>
+<span class="sig-name descname"><span class="pre">send</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">actions</span></span></em><span class="sig-paren">)</span></dt>
 <dd></dd></dl>
 
 <dl class="py method">
 <dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">close</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
+<span class="sig-name descname"><span class="pre">recv</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
 <dd></dd></dl>
 
-</dd></dl>
+<dl class="py method">
+<dt class="sig sig-object py">
+<span class="sig-name descname"><span class="pre">async_reset</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">seed</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
 
-<dl class="py class">
+<dl class="py method">
+<dt class="sig sig-object py">
+<span class="sig-name descname"><span class="pre">profile</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
+
+<dl class="py method">
+<dt class="sig sig-object py">
+<span class="sig-name descname"><span class="pre">reset</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">seed</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
+
+<dl class="py method">
 <dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.vectorization.</span></span><span class="sig-name descname"><span class="pre">Multiprocessing</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">env_creator</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_args</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">[]</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_kwargs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">{}</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_workers</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">envs_per_worker</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Runs environments in parallel on multiple processes</p>
-<p>Use this module for most applications</p>
+<span class="sig-name descname"><span class="pre">step</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">actions</span></span></em><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
+
 <dl class="py method">
 <dt class="sig sig-object py">
 <span class="sig-name descname"><span class="pre">put</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span></dt>
@@ -1170,177 +753,209 @@ <h1>Vectorization<a class="headerlink" href="#vectorization" title="Permalink to
 
 <dl class="py class">
 <dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.vectorization.</span></span><span class="sig-name descname"><span class="pre">Ray</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">env_creator</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_args</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">[]</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_kwargs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">{}</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_workers</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">envs_per_worker</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Runs environments in parallel on multiple processes using Ray</p>
-<p>Use this module for distributed simulation on a cluster. It can also be
-faster than multiprocessing on a single machine for specific environments.</p>
+<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.vectorization.</span></span><span class="sig-name descname"><span class="pre">Multiprocessing</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">env_creator</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">callable</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_args</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">list</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">[]</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_kwargs</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">dict</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">{}</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_envs</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">envs_per_worker</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">envs_per_batch</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">int</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_pool</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span></dt>
+<dd><p>Runs environments in parallel using multiprocessing</p>
+<p>Use this vectorization module for most applications</p>
+<dl class="py property">
+<dt class="sig sig-object py">
+<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">single_observation_space</span></span></dt>
+<dd></dd></dl>
+
+<dl class="py property">
+<dt class="sig sig-object py">
+<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">single_action_space</span></span></dt>
+<dd></dd></dl>
+
+<dl class="py property">
+<dt class="sig sig-object py">
+<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">structured_observation_space</span></span></dt>
+<dd></dd></dl>
+
+<dl class="py property">
+<dt class="sig sig-object py">
+<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">flat_observation_space</span></span></dt>
+<dd></dd></dl>
+
 <dl class="py method">
 <dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">put</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span></dt>
+<span class="sig-name descname"><span class="pre">unpack_batched_obs</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">obs</span></span></em><span class="sig-paren">)</span></dt>
 <dd></dd></dl>
 
 <dl class="py method">
 <dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">get</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span></dt>
+<span class="sig-name descname"><span class="pre">send</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">actions</span></span></em><span class="sig-paren">)</span></dt>
 <dd></dd></dl>
 
 <dl class="py method">
 <dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">close</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
+<span class="sig-name descname"><span class="pre">recv</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
 <dd></dd></dl>
 
-</dd></dl>
+<dl class="py method">
+<dt class="sig sig-object py">
+<span class="sig-name descname"><span class="pre">async_reset</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">seed</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
 
-</section>
-<section id="cleanrl-integration">
-<h1>CleanRL Integration<a class="headerlink" href="#cleanrl-integration" title="Permalink to this heading">#</a></h1>
-<p>Wrap your PyTorch policies for use with CleanRL</p>
-<dl class="py class">
+<dl class="py method">
 <dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.frameworks.cleanrl.</span></span><span class="sig-name descname"><span class="pre">Policy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Wrap a non-recurrent PyTorch model for use with CleanRL</p>
+<span class="sig-name descname"><span class="pre">profile</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
+
+<dl class="py method">
+<dt class="sig sig-object py">
+<span class="sig-name descname"><span class="pre">reset</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">seed</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
+
+<dl class="py method">
+<dt class="sig sig-object py">
+<span class="sig-name descname"><span class="pre">step</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">actions</span></span></em><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
+
+<dl class="py method">
+<dt class="sig sig-object py">
+<span class="sig-name descname"><span class="pre">put</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
+
 <dl class="py method">
 <dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">get_value</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">state</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">done</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
+<span class="sig-name descname"><span class="pre">get</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span></dt>
 <dd></dd></dl>
 
 <dl class="py method">
 <dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">get_action_and_value</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">action</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">done</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
+<span class="sig-name descname"><span class="pre">close</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
 <dd></dd></dl>
 
 </dd></dl>
 
-<p>Recurrence requires you to subclass our base policy instead. See the default policies for examples.</p>
 <dl class="py class">
 <dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.frameworks.cleanrl.</span></span><span class="sig-name descname"><span class="pre">RecurrentPolicy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Wrap a recurrent PyTorch model for use with CleanRL</p>
+<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.vectorization.</span></span><span class="sig-name descname"><span class="pre">Ray</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">env_creator</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">callable</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_args</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">list</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">[]</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_kwargs</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">dict</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">{}</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_envs</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">envs_per_worker</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">envs_per_batch</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">int</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_pool</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span></dt>
+<dd><p>Runs environments in parallel on multiple processes using Ray</p>
+<p>Use this module for distributed simulation on a cluster. It can also be
+faster than multiprocessing on a single machine for specific environments.</p>
 <dl class="py property">
 <dt class="sig sig-object py">
-<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">lstm</span></span></dt>
+<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">single_observation_space</span></span></dt>
+<dd></dd></dl>
+
+<dl class="py property">
+<dt class="sig sig-object py">
+<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">single_action_space</span></span></dt>
+<dd></dd></dl>
+
+<dl class="py property">
+<dt class="sig sig-object py">
+<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">structured_observation_space</span></span></dt>
+<dd></dd></dl>
+
+<dl class="py property">
+<dt class="sig sig-object py">
+<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">flat_observation_space</span></span></dt>
 <dd></dd></dl>
 
 <dl class="py method">
 <dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">get_value</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">state</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">done</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
+<span class="sig-name descname"><span class="pre">unpack_batched_obs</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">obs</span></span></em><span class="sig-paren">)</span></dt>
 <dd></dd></dl>
 
 <dl class="py method">
 <dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">get_action_and_value</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">state</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">action</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">done</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
+<span class="sig-name descname"><span class="pre">send</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">actions</span></span></em><span class="sig-paren">)</span></dt>
 <dd></dd></dl>
 
-</dd></dl>
+<dl class="py method">
+<dt class="sig sig-object py">
+<span class="sig-name descname"><span class="pre">recv</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
 
-</section>
-<section id="rllib-binding">
-<h1>RLlib Binding<a class="headerlink" href="#rllib-binding" title="Permalink to this heading">#</a></h1>
-<p>Wrap your policies for use with RLlib (WIP)</p>
-<p>RLlib support under construction as we focus on stable CleanRL support for v0.2</p>
-<p>Still supported via Discord, but not officially stable</p>
-<dl class="py function">
+<dl class="py method">
 <dt class="sig sig-object py">
-<span class="sig-prename descclassname"><span class="pre">pufferlib.frameworks.rllib.</span></span><span class="sig-name descname"><span class="pre">register_env</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">name</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">env_creator</span></span></em><span class="sig-paren">)</span></dt>
+<span class="sig-name descname"><span class="pre">async_reset</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">seed</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
 <dd></dd></dl>
 
-<dl class="py function">
+<dl class="py method">
 <dt class="sig sig-object py">
-<span class="sig-prename descclassname"><span class="pre">pufferlib.frameworks.rllib.</span></span><span class="sig-name descname"><span class="pre">read_checkpoints</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">tune_path</span></span></em><span class="sig-paren">)</span></dt>
+<span class="sig-name descname"><span class="pre">profile</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
 <dd></dd></dl>
 
-<dl class="py function">
+<dl class="py method">
 <dt class="sig sig-object py">
-<span class="sig-prename descclassname"><span class="pre">pufferlib.frameworks.rllib.</span></span><span class="sig-name descname"><span class="pre">create_policies</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">n</span></span></em><span class="sig-paren">)</span></dt>
+<span class="sig-name descname"><span class="pre">reset</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">seed</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
 <dd></dd></dl>
 
-<dl class="py function">
+<dl class="py method">
 <dt class="sig sig-object py">
-<span class="sig-prename descclassname"><span class="pre">pufferlib.frameworks.rllib.</span></span><span class="sig-name descname"><span class="pre">make_policy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">policy_cls</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">lstm_layers</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Wrap a PyTorch model for use with RLLib</p>
-<dl class="field-list simple">
-<dt class="field-odd">Parameters<span class="colon">:</span></dt>
-<dd class="field-odd"><ul class="simple">
-<li><p><strong>policy_cls</strong> – A pufferlib.models.Policy subclass that implements the PufferLib model API</p></li>
-<li><p><strong>lstm_layers</strong> – The number of LSTM layers to use. If 0, no LSTM is used</p></li>
-</ul>
-</dd>
-<dt class="field-even">Returns<span class="colon">:</span></dt>
-<dd class="field-even"><p>A new RLlib model class wrapping your model</p>
-</dd>
-</dl>
+<span class="sig-name descname"><span class="pre">step</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">actions</span></span></em><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
+
+<dl class="py method">
+<dt class="sig sig-object py">
+<span class="sig-name descname"><span class="pre">put</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
+
+<dl class="py method">
+<dt class="sig sig-object py">
+<span class="sig-name descname"><span class="pre">get</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
+
+<dl class="py method">
+<dt class="sig sig-object py">
+<span class="sig-name descname"><span class="pre">close</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
+
 </dd></dl>
 
+</section>
+<section id="cleanrl-integration">
+<h1>CleanRL Integration<a class="headerlink" href="#cleanrl-integration" title="Permalink to this heading">#</a></h1>
+<p>Wrap your PyTorch policies for use with CleanRL</p>
 <dl class="py class">
 <dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.frameworks.rllib.</span></span><span class="sig-name descname"><span class="pre">RLPredictor</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">policy</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Policy</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">preprocessor</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">Preprocessor</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Subclasseses must call Predictor.__init__() to set a preprocessor.</p>
+<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.frameworks.cleanrl.</span></span><span class="sig-name descname"><span class="pre">Policy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span></dt>
+<dd><p>Wrap a non-recurrent PyTorch model for use with CleanRL</p>
 <dl class="py method">
 <dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">predict</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">data</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Perform inference on a batch of data.</p>
-<dl class="field-list simple">
-<dt class="field-odd">Parameters<span class="colon">:</span></dt>
-<dd class="field-odd"><ul class="simple">
-<li><p><strong>data</strong> – A batch of input data of type <code class="docutils literal notranslate"><span class="pre">DataBatchType</span></code>.</p></li>
-<li><p><strong>kwargs</strong> – Arguments specific to predictor implementations. These are passed</p></li>
-<li><p><strong>_predict_pandas.</strong> (<em>directly to</em>) – </p></li>
-</ul>
-</dd>
-<dt class="field-even">Returns<span class="colon">:</span></dt>
-<dd class="field-even"><p><dl class="simple">
-<dt>Prediction result. The return type will be the same as the</dt><dd><p>input type.</p>
-</dd>
-</dl>
-</p>
-</dd>
-<dt class="field-odd">Return type<span class="colon">:</span></dt>
-<dd class="field-odd"><p>DataBatchType</p>
-</dd>
-</dl>
-</dd></dl>
+<span class="sig-name descname"><span class="pre">get_value</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">state</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
+
+<dl class="py method">
+<dt class="sig sig-object py">
+<span class="sig-name descname"><span class="pre">get_action_and_value</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">action</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
 
 </dd></dl>
 
+<p>Wrap your PyTorch policies for use with CleanRL but add an LSTM. This requires you to use our policy API. It’s pretty simple – see the default policies for examples.</p>
 <dl class="py class">
 <dt class="sig sig-object py">
-<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.frameworks.rllib.</span></span><span class="sig-name descname"><span class="pre">Callbacks</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">legacy_callbacks_dict</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">Dict</span><span class="p"><span class="pre">[</span></span><span class="pre">str</span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="pre">callable</span><span class="p"><span class="pre">]</span></span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
-<dd><dl class="py method">
+<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pufferlib.frameworks.cleanrl.</span></span><span class="sig-name descname"><span class="pre">RecurrentPolicy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Any</span></span></em><span class="sig-paren">)</span></dt>
+<dd><p>Wrap a recurrent PyTorch model for use with CleanRL</p>
+<dl class="py property">
 <dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">on_train_result</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">algorithm</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">result</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">trainer</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#x2192;</span> <span class="sig-return-typehint"><span class="pre">None</span></span></span></dt>
-<dd><p>Run after 1 epoch at the trainer level</p>
-</dd></dl>
+<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">lstm</span></span></dt>
+<dd></dd></dl>
 
 <dl class="py method">
 <dt class="sig sig-object py">
-<span class="sig-name descname"><span class="pre">on_episode_end</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">worker</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">base_env</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">policies</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">episode</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span></dt>
-<dd><p>Runs when an episode is done.</p>
-<dl class="field-list simple">
-<dt class="field-odd">Parameters<span class="colon">:</span></dt>
-<dd class="field-odd"><ul class="simple">
-<li><p><strong>worker</strong> – Reference to the current rollout worker.</p></li>
-<li><p><strong>base_env</strong> – BaseEnv running the episode. The underlying
-sub environment objects can be retrieved by calling
-<cite>base_env.get_sub_environments()</cite>.</p></li>
-<li><p><strong>policies</strong> – Mapping of policy id to policy
-objects. In single agent mode there will only be a single
-“default_policy”.</p></li>
-<li><p><strong>episode</strong> – Episode object which contains episode
-state. You can use the <cite>episode.user_data</cite> dict to store
-temporary data, and <cite>episode.custom_metrics</cite> to store custom
-metrics for the episode.
-In case of environment failures, episode may also be an Exception
-that gets thrown from the environment before the episode finishes.
-Users of this callback may then handle these error cases properly
-with their custom logics.</p></li>
-<li><p><strong>kwargs</strong> – Forward compatibility placeholder.</p></li>
-</ul>
-</dd>
-</dl>
-</dd></dl>
+<span class="sig-name descname"><span class="pre">get_value</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">state</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
+
+<dl class="py method">
+<dt class="sig sig-object py">
+<span class="sig-name descname"><span class="pre">get_action_and_value</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">state</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">action</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span></dt>
+<dd></dd></dl>
 
 </dd></dl>
 
+</section>
+<section id="rllib-binding">
+<h1>RLlib Binding<a class="headerlink" href="#rllib-binding" title="Permalink to this heading">#</a></h1>
+<p>Wrap your policies for use with RLlib (Shelved until RLlib is more stable)</p>
+</section>
+<section id="sb3-binding">
+<h1>SB3 Binding<a class="headerlink" href="#sb3-binding" title="Permalink to this heading">#</a></h1>
+<p>Coming soon!</p>
 </section>
 
         </article>
@@ -1353,7 +968,7 @@ <h1>RLlib Binding<a class="headerlink" href="#rllib-binding" title="Permalink to
                 <div class="context">
                   <span>Next</span>
                 </div>
-                <div class="title">PufferLib 0.4: Ready to Take on Bigger Fish</div>
+                <div class="title">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</div>
               </div>
               <svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
             </a>
diff --git a/docs/build/html/rst/blog.html b/docs/build/html/rst/blog.html
index 23783aa..204cb1c 100644
--- a/docs/build/html/rst/blog.html
+++ b/docs/build/html/rst/blog.html
@@ -6,7 +6,7 @@
 <link rel="index" title="Index" href="../genindex.html" /><link rel="search" title="Search" href="../search.html" /><link rel="prev" title="Emulation" href="api.html" />
 
     <!-- Generated with Sphinx 5.0.0 and Furo 2023.03.27 -->
-        <title>PufferLib 0.4: Ready to Take on Bigger Fish - PufferLib 0.4.3 documentation</title>
+        <title>PufferLib 0.5: A Bigger EnvPool for Growing Puffers - PufferLib 0.5.0 documentation</title>
       <link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
     <link rel="stylesheet" type="text/css" href="../_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
     <link rel="stylesheet" type="text/css" href="../_static/design-style.1e8bd061cd6da7fc9cf755528e8ffc24.min.css" />
@@ -190,7 +190,7 @@
       </label>
     </div>
     <div class="header-center">
-      <a href="../index.html"><div class="brand">PufferLib 0.4.3 documentation</div></a>
+      <a href="../index.html"><div class="brand">PufferLib 0.5.0 documentation</div></a>
     </div>
     <div class="header-right">
       <div class="theme-toggle-container theme-toggle-header">
@@ -213,7 +213,7 @@
       <div class="sidebar-sticky"><a class="sidebar-brand" href="../index.html">
   
   
-  <span class="sidebar-brand-text">PufferLib 0.4.3 documentation</span>
+  <span class="sidebar-brand-text">PufferLib 0.5.0 documentation</span>
   
 </a><form class="sidebar-search-container" method="get" action="../search.html" role="search">
   <input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
@@ -231,15 +231,17 @@
 <p class="caption" role="heading"><span class="caption-text">API</span></p>
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="api.html">Emulation</a></li>
-<li class="toctree-l1"><a class="reference internal" href="api.html#registry">Registry</a></li>
+<li class="toctree-l1"><a class="reference internal" href="api.html#environments">Environments</a></li>
 <li class="toctree-l1"><a class="reference internal" href="api.html#models">Models</a></li>
 <li class="toctree-l1"><a class="reference internal" href="api.html#vectorization">Vectorization</a></li>
 <li class="toctree-l1"><a class="reference internal" href="api.html#cleanrl-integration">CleanRL Integration</a></li>
 <li class="toctree-l1"><a class="reference internal" href="api.html#rllib-binding">RLlib Binding</a></li>
+<li class="toctree-l1"><a class="reference internal" href="api.html#sb3-binding">SB3 Binding</a></li>
 </ul>
 <p class="caption" role="heading"><span class="caption-text">Blog</span></p>
 <ul class="current">
-<li class="toctree-l1 current current-page"><a class="current reference internal" href="#">PufferLib 0.4: Ready to Take on Bigger Fish</a></li>
+<li class="toctree-l1 current current-page"><a class="current reference internal" href="#">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a></li>
+<li class="toctree-l1"><a class="reference internal" href="#pufferlib-0-4-ready-to-take-on-bigger-fish">PufferLib 0.4: Ready to Take on Bigger Fish</a></li>
 <li class="toctree-l1"><a class="reference internal" href="#pufferlib-0-2-ready-to-take-on-the-big-fish">PufferLib 0.2: Ready to Take on the Big Fish</a></li>
 </ul>
 
@@ -281,7 +283,60 @@
     <source src="../_static/banner.mp4" type="video/mp4">
     Your browser does not support this video.
   </video>
-</center><section id="pufferlib-0-4-ready-to-take-on-bigger-fish">
+</center><section id="pufferlib-0-5-a-bigger-envpool-for-growing-puffers">
+<h1>PufferLib 0.5: A Bigger EnvPool for Growing Puffers<a class="headerlink" href="#pufferlib-0-5-a-bigger-envpool-for-growing-puffers" title="Permalink to this heading">#</a></h1>
+<p>This is what reinforcement learning does to your CPU utilization.</p>
+<figure class="align-default">
+<img alt="../_images/0-5_blog_header.png" src="../_images/0-5_blog_header.png" />
+</figure>
+<p>You wouldn’t pack a box this way, right? With PufferLib 0.5, we are releasing a Python implementation of EnvPool to solve this problem. <strong>TL;DR: ~20% performance improvement across most workloads, up to 2x for complex environments, and native multiagent support.</strong></p>
+<figure class="align-default">
+<img alt="../_images/0-5_blog_envpool.png" src="../_images/0-5_blog_envpool.png" />
+</figure>
+<p>If you just want the enhancements, you can pip install -U pufferlib. But if you’d like to see a bit behind the curtain, read on!</p>
+<section id="the-simulation-crisis">
+<h2>The Simulation Crisis<a class="headerlink" href="#the-simulation-crisis" title="Permalink to this heading">#</a></h2>
+<p>You want to do some RL research, so you install Atari. Say it runs at 1000 steps/second on 1 core and 5000 steps/second on 6 cores. Now, you decide you want to work on a more interesting environment and happen upon Neural MMO, a brilliant project that must have been developed by a truly fantastic team. It runs at 1500 steps/second – faster than Atari! So you scale it up to 6 cores and it runs at … 1800 steps per second. What gives?</p>
+<p>The problem is that environments simulated on different cores do not run at the same speed. Even if they did, many modern CPUs have cores that run at different speeds. Parallelization overhead is mostly the sum of:
+-  Launching/synchronization overhead. This is roughly 0.1 ms per process and is linear in the number of processes. At ~100 steps per second, you can ignore it. At &gt;10,000 steps/second, it is the main limiting factor.
+- Environment variance. This is defined by the ratio mu/std of the environment simulation time and scales with the square root of the number of processes. For 24 processes, 10% std is 20% overhead and 100% std is 300% overhead.
+- Different core speeds. Many modern CPUs, especially Intel desktop series processors, feature additional cores that are ~20% slower than the main cores.
+- Model latency. This is the time taken to transfer observations to GPU, run the model, and transfer actions to CPU. It is not technically part of multiprocesssing overhead, but naive implementations will leave CPUs idle during model inference.</p>
+<p>As a rule of thumb, simple RL environments have &lt; 10% variance because the code is always simulating roughly the same thing. Complex environments, especially ones with variable numbers of agents, can have &gt; 100% variance because different code runs depending on the current state. On the other hand, if your environment has 100 agents, you are effectively running 100x fewer simulations for the same data, so launching/synchronization overhead is lower.</p>
+</section>
+<section id="the-solution">
+<h2>The Solution<a class="headerlink" href="#the-solution" title="Permalink to this heading">#</a></h2>
+<p>Run multiple environments per process if you have &gt; ~2000 sps environment with variance &lt; ~10%. This will reduce the impact of launching/synchronization overhead and also reduces variance because you are summing over samples. In PufferLib, we typically enable this only for environments &gt; ~5000 sps because of interactions with the optimizations below.</p>
+<p>Simulate multiple buffers of environments so that one buffer is running while your model is processing observations from the other. This technique was introduced by <a class="reference external" href="https://github.com/alex-petrenko/sample-factory">https://github.com/alex-petrenko/sample-factory</a> and does not speed up simulation, but it allows you to interleave simulations from two sets of environments. It’s a good trick, but it is superseded by the final optimization, which is faster and simpler.</p>
+<p>Run a pool of environments and sample from the first ones to finish stepping. For example, if you want a batch of 24 observations, you might run 64 environments. At each step, the 24 for which you have computed actions are going to take a while to simulate, but you can still select the fastest 24 from the other 64-24=40 environments. This technique was introduced by <a class="reference external" href="https://github.com/sail-sg/envpool">https://github.com/sail-sg/envpool</a> and is massively effective, but the original implementation is only for specific C/C++ environments. PufferLib’s implementation is in Python, so it is slower, but it works for arbitrary Python environments and includes native multiagent support.</p>
+</section>
+<section id="experiments">
+<h2>Experiments<a class="headerlink" href="#experiments" title="Permalink to this heading">#</a></h2>
+<p>To evaluate the performance of different backends, I am using a 13900k (24 cores) on a max specced Maingear desktop running a minimal Debian 12 install. We test 9 different simulated environments: 1e-2 to 1-4 mean delay with 0-100% delay std. For each environment, we spawn 1, 6, 24, 96, and 192 processes for each backend tested (Gymnasium’s and Pufferlib’s serial and multiprocessing implementations + Pufferlib’s pool). We also have Ray implementations compatible with our pooling code, but that will be a separate post. Additionally, PufferLib implementations sweep over (1, 2, 4) environments per process and PufferLib pool will compute 24 observations at a time. We do not consider model latency, which can yield another 2x relative performance for pooling on specific workloads.</p>
+<figure class="align-default">
+<img alt="../_images/0-5_blog_envpool.png" src="../_images/0-5_blog_envpool.png" />
+</figure>
+<p>9 groups of bars, each for one environment. 5 groups of bars per environment, each for a specific number of processes. The serial Gymasium/PufferLib experiments match in all cases. The best PufferLib settings are 10-20% faster than the best Gymasium settings for all workloads and can be up to 2x faster for environments with a high standard deviation in important cases (for instance, you may not want to run 192 copies of heavy environments). Again, this is before even considering the time saved by interleaving with the model forward pass.</p>
+<p>All of the implementations start to dip ~10% at 1,000 steps/second and ~50% at 10,000 steps/second. To make absolutely sure that this overhead is unavoidable, I reimplemented the entire pool architecture as minimally as possible, without any of the environment wrapper or data transfer overhead:</p>
+<p>SPS: 10734.36 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 1 batch_size: 1 sync: False
+SPS: 11640.42 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 1 batch_size: 1 sync: True
+SPS: 32715.65 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 6 batch_size: 6 sync: False
+SPS: 27635.31 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 6 batch_size: 6 sync: True
+SPS: 22681.48 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 24 batch_size: 6 sync: False
+SPS: 26183.73 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 24 batch_size: 24 sync: False
+SPS: 30120.75 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 24 batch_size: 6 sync: True</p>
+<p>As it turns out, Python’s multiprocessing caps around 10,000 steps per second per worker. There is still room for improvement by running more environments per process, but at this speed, small optimizations to the data processing code start to matter much more.</p>
+</section>
+<section id="technical-details-and-gotchas">
+<h2>Technical Details and Gotchas<a class="headerlink" href="#technical-details-and-gotchas" title="Permalink to this heading">#</a></h2>
+<p>PufferLib’s vectorization library is extremely concise – around 800 lines for serial, multiprocessing, and ray backends with support for PufferLib’s Gymnasium and PettingZoo wrappers. Adding envpool only required changing around 100 lines of code but required a lot of performance testing:
+Don’t use multiprocessing.Queue. There’s no fast way to poll which processes are done. Instead, use multiprocessing.Pipe and poll with selectors. I have not seen noticeable overhead from this in any of my tests.
+Don’t use time.sleep(), as this will trigger context switching, or time.time(), as this will include time spent on other processes. Use time.process_time() if you want an equal slice per core or count to ~150M/second (time it on your machine) if you want a fixed amount of work.</p>
+<p>The ray backend was extremely easy to implement thanks to ray.wait(). It is unfortunately too slow for most environments, but I wish standard multiprocessing used the Ray API, if not the architecture. The library itself has some cleanup issues that can cause crashes during heavy performance tests, which is why results are not included in this post.</p>
+<p>There’s one other thing I want to mention for people looking at the code. I was doing some experimental procedural stuff testing different programming paradigms, so the actual class interfaces are in __init__. It’s pretty much equivalent to one subclass per backend.</p>
+</section>
+</section>
+<section id="pufferlib-0-4-ready-to-take-on-bigger-fish">
 <h1>PufferLib 0.4: Ready to Take on Bigger Fish<a class="headerlink" href="#pufferlib-0-4-ready-to-take-on-bigger-fish" title="Permalink to this heading">#</a></h1>
 <p>PufferLib 0.4 is out now! Make your RL environments and libraries play nice with one-line wrappers, pain-free vectorization, and more.</p>
 <div class="sd-card sd-sphinx-override sd-w-75 sd-mt-4 sd-mb-2 sd-ml-auto sd-mr-auto sd-shadow-sm sd-card-hover sd-text-center docutils">
@@ -451,7 +506,14 @@ <h2>Next Steps<a class="headerlink" href="#next-steps" title="Permalink to this
         <div class="toc-tree-container">
           <div class="toc-tree">
             <ul>
-<li><a class="reference internal" href="#">PufferLib 0.4: Ready to Take on Bigger Fish</a><ul>
+<li><a class="reference internal" href="#">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a><ul>
+<li><a class="reference internal" href="#the-simulation-crisis">The Simulation Crisis</a></li>
+<li><a class="reference internal" href="#the-solution">The Solution</a></li>
+<li><a class="reference internal" href="#experiments">Experiments</a></li>
+<li><a class="reference internal" href="#technical-details-and-gotchas">Technical Details and Gotchas</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#pufferlib-0-4-ready-to-take-on-bigger-fish">PufferLib 0.4: Ready to Take on Bigger Fish</a><ul>
 <li><a class="reference internal" href="#emulation">Emulation</a></li>
 <li><a class="reference internal" href="#vectorization">Vectorization</a></li>
 <li><a class="reference internal" href="#puffertank">PufferTank</a></li>
diff --git a/docs/build/html/rst/landing.html b/docs/build/html/rst/landing.html
index a7767ba..fff6cc6 100644
--- a/docs/build/html/rst/landing.html
+++ b/docs/build/html/rst/landing.html
@@ -6,7 +6,7 @@
 <link rel="index" title="Index" href="../genindex.html" /><link rel="search" title="Search" href="../search.html" /><link rel="next" title="Emulation" href="api.html" /><link rel="prev" title="Index" href="../index.html" />
 
     <!-- Generated with Sphinx 5.0.0 and Furo 2023.03.27 -->
-        <title>Libraries - PufferLib 0.4.3 documentation</title>
+        <title>Libraries - PufferLib 0.5.0 documentation</title>
       <link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
     <link rel="stylesheet" type="text/css" href="../_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
     <link rel="stylesheet" type="text/css" href="../_static/design-style.1e8bd061cd6da7fc9cf755528e8ffc24.min.css" />
@@ -190,7 +190,7 @@
       </label>
     </div>
     <div class="header-center">
-      <a href="../index.html"><div class="brand">PufferLib 0.4.3 documentation</div></a>
+      <a href="../index.html"><div class="brand">PufferLib 0.5.0 documentation</div></a>
     </div>
     <div class="header-right">
       <div class="theme-toggle-container theme-toggle-header">
@@ -213,7 +213,7 @@
       <div class="sidebar-sticky"><a class="sidebar-brand" href="../index.html">
   
   
-  <span class="sidebar-brand-text">PufferLib 0.4.3 documentation</span>
+  <span class="sidebar-brand-text">PufferLib 0.5.0 documentation</span>
   
 </a><form class="sidebar-search-container" method="get" action="../search.html" role="search">
   <input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
@@ -231,15 +231,17 @@
 <p class="caption" role="heading"><span class="caption-text">API</span></p>
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="api.html">Emulation</a></li>
-<li class="toctree-l1"><a class="reference internal" href="api.html#registry">Registry</a></li>
+<li class="toctree-l1"><a class="reference internal" href="api.html#environments">Environments</a></li>
 <li class="toctree-l1"><a class="reference internal" href="api.html#models">Models</a></li>
 <li class="toctree-l1"><a class="reference internal" href="api.html#vectorization">Vectorization</a></li>
 <li class="toctree-l1"><a class="reference internal" href="api.html#cleanrl-integration">CleanRL Integration</a></li>
 <li class="toctree-l1"><a class="reference internal" href="api.html#rllib-binding">RLlib Binding</a></li>
+<li class="toctree-l1"><a class="reference internal" href="api.html#sb3-binding">SB3 Binding</a></li>
 </ul>
 <p class="caption" role="heading"><span class="caption-text">Blog</span></p>
 <ul>
-<li class="toctree-l1"><a class="reference internal" href="blog.html">PufferLib 0.4: Ready to Take on Bigger Fish</a></li>
+<li class="toctree-l1"><a class="reference internal" href="blog.html">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a></li>
+<li class="toctree-l1"><a class="reference internal" href="blog.html#pufferlib-0-4-ready-to-take-on-bigger-fish">PufferLib 0.4: Ready to Take on Bigger Fish</a></li>
 <li class="toctree-l1"><a class="reference internal" href="blog.html#pufferlib-0-2-ready-to-take-on-the-big-fish">PufferLib 0.2: Ready to Take on the Big Fish</a></li>
 </ul>
 
@@ -310,7 +312,7 @@
 </div><div class="line-block">
 <div class="line"><br /></div>
 </div>
-<p>Join our community Discord for support and Discussion, follow my Twitter for news, and star the repo to feed the puffer. <a class="reference download internal" download="" href="../_downloads/b6289fa6a05068cc61ddac77ec727f3f/neurips_2023_aloe.pdf"><code class="xref download docutils literal notranslate"><span class="pre">Whitepaper</span></code></a> appearing at NeurIPS 2023 ALOE Workshop. Come say hi!</p>
+<p>Join our community Discord for support and Discussion, follow my Twitter for news, and star the repo to feed the puffer. We also have a <a class="reference download internal" download="" href="../_downloads/b6289fa6a05068cc61ddac77ec727f3f/neurips_2023_aloe.pdf"><code class="xref download docutils literal notranslate"><span class="pre">Whitepaper</span></code></a> featured at the NeurIPS 2023 ALOE workshop.</p>
 <details class="sd-sphinx-override sd-dropdown sd-card sd-mb-3">
 <summary class="sd-summary-title sd-card-header">
 Installation<div class="sd-summary-down docutils">
@@ -323,7 +325,7 @@
 </input><label class="sd-tab-label" for="sd-tab-item-0">
 PufferTank</label><div class="sd-tab-content docutils">
 <p class="sd-card-text"><a class="reference external" href="https://github.com/pufferai/puffertank">PufferTank</a> is a GPU container with PufferLib and dependencies for all environments in the registry, including some that are slow and tricky to install.</p>
-<p class="sd-card-text">If you are new to containers, clone the repository and open it in VSCode. You will need to install the Dev Container plugin as well as Docker Desktop. VSCode will then detect the settings in .devcontainer and set up the container for you.</p>
+<p class="sd-card-text">If you have not used containers before and just want everything to work, clone the repository and open it in VSCode. You will need to install the Dev Container plugin as well as Docker Desktop. VSCode will then detect the settings in .devcontainer and set up the container for you.</p>
 </div>
 <input id="sd-tab-item-1" name="sd-tab-set-0" type="radio">
 </input><label class="sd-tab-label" for="sd-tab-item-1">
@@ -348,42 +350,47 @@
 <svg version="1.1" width="1.5em" height="1.5em" class="sd-octicon sd-octicon-chevron-up" viewBox="0 0 24 24" aria-hidden="true"><path fill-rule="evenodd" d="M18.78 15.28a.75.75 0 000-1.06l-6.25-6.25a.75.75 0 00-1.06 0l-6.25 6.25a.75.75 0 101.06 1.06L12 9.56l5.72 5.72a.75.75 0 001.06 0z"></path></svg></div>
 </summary><div class="sd-summary-content sd-card-body docutils">
 <p class="sd-card-text"><strong>Joseph Suarez</strong>: Creator and developer of PufferLib</p>
-<p class="sd-card-text"><strong>David Bloomin</strong>: Policy pool/store/selector</p>
+<p class="sd-card-text"><strong>David Bloomin</strong>: 0.4 policy pool/store/selector</p>
 <p class="sd-card-text"><strong>Nick Jenkins</strong>: Layout for the system architecture diagram. Adversary.design.</p>
 <p class="sd-card-text"><strong>Andranik Tigranyan</strong>: Streamline and animate the pufferfish. Hire him on UpWork if you like what you see here.</p>
 <p class="sd-card-text"><strong>Sara Earle</strong>: Original pufferfish model. Hire her on UpWork if you like what you see here.</p>
 </div>
 </details><p><strong>You can open this guide in a Colab notebook by clicking the demo button at the top of this page</strong></p>
-<p>Complex environments may have heirarchical observations and actions, variable numbers of agents, and other quirks that make them difficult to work with and incompatible with standard reinforcement learning libraries. PufferLib’s emulation layer makes every environment look like it has flat observations and actions and a constant number of agents, with no changes to the underlying environment. Here’s how it works with two notoriously complex environments, NetHack and Neural MMO.</p>
+<p>Complex environments may have heirarchical observations and actions, variable numbers of agents, and other quirks that make them difficult to work with and incompatible with standard reinforcement learning libraries. PufferLib’s emulation layer makes every environment look like it has flat observations/actions and a constant number of agents. Here’s how it works with NetHack and Neural MMO, two notoriously complex environments.</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pufferlib.emulation</span>
+<span class="kn">import</span> <span class="nn">pufferlib.wrappers</span>
 
 <span class="kn">import</span> <span class="nn">nle</span><span class="o">,</span> <span class="nn">nmmo</span>
 
 <span class="k">def</span> <span class="nf">nmmo_creator</span><span class="p">():</span>
-    <span class="k">return</span> <span class="n">pufferlib</span><span class="o">.</span><span class="n">emulation</span><span class="o">.</span><span class="n">PettingZooPufferEnv</span><span class="p">(</span><span class="n">env_creator</span><span class="o">=</span><span class="n">nmmo</span><span class="o">.</span><span class="n">Env</span><span class="p">)</span>
+    <span class="n">env</span> <span class="o">=</span> <span class="n">nmmo</span><span class="o">.</span><span class="n">Env</span><span class="p">()</span>
+    <span class="n">env</span> <span class="o">=</span> <span class="n">pufferlib</span><span class="o">.</span><span class="n">wrappers</span><span class="o">.</span><span class="n">PettingZooTruncatedWrapper</span><span class="p">(</span><span class="n">env</span><span class="p">)</span>
+    <span class="k">return</span> <span class="n">pufferlib</span><span class="o">.</span><span class="n">emulation</span><span class="o">.</span><span class="n">PettingZooPufferEnv</span><span class="p">(</span><span class="n">env</span><span class="o">=</span><span class="n">env</span><span class="p">)</span>
 
 <span class="k">def</span> <span class="nf">nethack_creator</span><span class="p">():</span>
-    <span class="k">return</span> <span class="n">pufferlib</span><span class="o">.</span><span class="n">emulation</span><span class="o">.</span><span class="n">GymPufferEnv</span><span class="p">(</span><span class="n">env_creator</span><span class="o">=</span><span class="n">nle</span><span class="o">.</span><span class="n">env</span><span class="o">.</span><span class="n">NLE</span><span class="p">)</span>
+    <span class="k">return</span> <span class="n">pufferlib</span><span class="o">.</span><span class="n">emulation</span><span class="o">.</span><span class="n">GymnasiumPufferEnv</span><span class="p">(</span><span class="n">env_creator</span><span class="o">=</span><span class="n">nle</span><span class="o">.</span><span class="n">env</span><span class="o">.</span><span class="n">NLE</span><span class="p">)</span>
 </pre></div>
 </div>
-<p>You can pass envs by class, creator function, or object, with or without additional arguments. These wrappers enable us to make some optimizations to vectorization code that would be difficult to implement otherwise. You can choose from a variety of vectorization backends. They all share the same interface with synchronous and asynchronous options.</p>
+<p>The wrappers give you back a Gymnasium/PettingZoo compliant environment. There is no loss of generality and no change to the underlying environment. You can wrap environments by class, creator function, or object, with or without additional arguments. These wrappers enable us to make some optimizations to vectorization code that would be difficult to implement otherwise. You can choose from a variety of vectorization backends. They all share the same interface with synchronous and asynchronous options.</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pufferlib.vectorization</span>
 
-<span class="c1"># vec = pufferlib.vectorization.Serial</span>
-<span class="n">vec</span> <span class="o">=</span> <span class="n">pufferlib</span><span class="o">.</span><span class="n">vectorization</span><span class="o">.</span><span class="n">Multiprocessing</span>
+<span class="n">vec</span> <span class="o">=</span> <span class="n">pufferlib</span><span class="o">.</span><span class="n">vectorization</span><span class="o">.</span><span class="n">Serial</span>
+<span class="c1"># vec = pufferlib.vectorization.Multiprocessing</span>
 <span class="c1"># vec = pufferlib.vectorization.Ray</span>
 
-<span class="n">envs</span> <span class="o">=</span> <span class="n">vec</span><span class="p">(</span><span class="n">nmmo_creator</span><span class="p">,</span> <span class="n">num_workers</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">envs_per_worker</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
+<span class="c1"># Vectorization API. Specify total number of environments and number per worker</span>
+<span class="c1"># Setting env_pool=True can be much faster but requires some tweaks to learning code</span>
+<span class="n">envs</span> <span class="o">=</span> <span class="n">vec</span><span class="p">(</span><span class="n">nmmo_creator</span><span class="p">,</span> <span class="n">num_envs</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">envs_per_worker</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">env_pool</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
 
-<span class="n">sync</span> <span class="o">=</span> <span class="kc">True</span>
-<span class="k">if</span> <span class="n">sync</span><span class="p">:</span>
-    <span class="n">obs</span> <span class="o">=</span> <span class="n">envs</span><span class="o">.</span><span class="n">reset</span><span class="p">()</span>
-<span class="k">else</span><span class="p">:</span>
-    <span class="n">envs</span><span class="o">.</span><span class="n">async_reset</span><span class="p">()</span>
-    <span class="n">obs</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">envs</span><span class="o">.</span><span class="n">recv</span><span class="p">()</span>
+<span class="c1"># Synchronous API - reset/step</span>
+<span class="c1"># obs = envs.reset()[0]</span>
+
+<span class="c1"># Asynchronous API - async_reset, send/recv</span>
+<span class="n">envs</span><span class="o">.</span><span class="n">async_reset</span><span class="p">()</span>
+<span class="n">obs</span> <span class="o">=</span> <span class="n">envs</span><span class="o">.</span><span class="n">recv</span><span class="p">()[</span><span class="mi">0</span><span class="p">]</span>
 </pre></div>
 </div>
-<p>We suggest Serial for debugging and Multiprocessing for most training runs. Ray is a good option if you need to scale beyond a single machine.</p>
+<p>Our backends support asynchronous on-policy sampling through a Python implementation of EnvPool. This makes them <em>faster</em> than the implementations that ship with most RL libraries. We suggest Serial for debugging and Multiprocessing for most training runs. Ray is a good option if you need to scale beyond a single machine.</p>
 <p>PufferLib allows you to write vanilla PyTorch policies and use them with multiple learning libraries. We take care of the details of converting between the different APIs. Here’s a policy that will work with <em>any</em> environment, with a one-line wrapper for CleanRL.</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">torch</span>
 <span class="kn">from</span> <span class="nn">torch</span> <span class="kn">import</span> <span class="n">nn</span>
@@ -392,7 +399,7 @@
 <span class="kn">import</span> <span class="nn">pufferlib.frameworks.cleanrl</span>
 
 <span class="k">class</span> <span class="nc">Policy</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span>
-    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">envs</span><span class="p">):</span>
+    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">env</span><span class="p">):</span>
         <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
         <span class="bp">self</span><span class="o">.</span><span class="n">encoder</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">prod</span><span class="p">(</span>
             <span class="n">envs</span><span class="o">.</span><span class="n">single_observation_space</span><span class="o">.</span><span class="n">shape</span><span class="p">),</span> <span class="mi">128</span><span class="p">)</span>
@@ -411,43 +418,41 @@
 <span class="n">policy</span> <span class="o">=</span> <span class="n">Policy</span><span class="p">(</span><span class="n">envs</span><span class="o">.</span><span class="n">driver_env</span><span class="p">)</span>
 <span class="n">cleanrl_policy</span> <span class="o">=</span> <span class="n">pufferlib</span><span class="o">.</span><span class="n">frameworks</span><span class="o">.</span><span class="n">cleanrl</span><span class="o">.</span><span class="n">Policy</span><span class="p">(</span><span class="n">policy</span><span class="p">)</span>
 <span class="n">actions</span> <span class="o">=</span> <span class="n">cleanrl_policy</span><span class="o">.</span><span class="n">get_action_and_value</span><span class="p">(</span><span class="n">obs</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">numpy</span><span class="p">()</span>
-<span class="n">obs</span><span class="p">,</span> <span class="n">rewards</span><span class="p">,</span> <span class="n">dones</span><span class="p">,</span> <span class="n">infos</span> <span class="o">=</span> <span class="n">envs</span><span class="o">.</span><span class="n">step</span><span class="p">(</span><span class="n">actions</span><span class="p">)</span>
+<span class="n">obs</span><span class="p">,</span> <span class="n">rewards</span><span class="p">,</span> <span class="n">terminals</span><span class="p">,</span> <span class="n">truncateds</span><span class="p">,</span> <span class="n">infos</span><span class="p">,</span> <span class="n">env_id</span><span class="p">,</span> <span class="n">mask</span> <span class="o">=</span> <span class="n">envs</span><span class="o">.</span><span class="n">step</span><span class="p">(</span><span class="n">actions</span><span class="p">)</span>
 <span class="n">envs</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
 </pre></div>
 </div>
-<p>There’s also a lightweight, fully optional base policy class for PufferLib. It breaks the forward pass into two functions, encode_observations and decode_actions. The advantage of this is that it lets us handle recurrance for you, since every framework does this a bit differently.</p>
-<p>So far, the code above is fully general and does not rely on PufferLib support for specific environments. For convenience, we also provide a registry of environments and models. Here’s a complete example.</p>
+<p>There’s also an optional policy base class for PufferLib. It just breaks the forward pass into an encode and decode step, which allows us to handle recurrance for you. So far, the code above is fully general and does not rely on PufferLib support for specific environments. For convenience, we also provide environment hooks with standard wrappers and baseline models. Here’s a complete example.</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">torch</span>
 
 <span class="kn">import</span> <span class="nn">pufferlib.models</span>
 <span class="kn">import</span> <span class="nn">pufferlib.vectorization</span>
 <span class="kn">import</span> <span class="nn">pufferlib.frameworks.cleanrl</span>
-<span class="kn">import</span> <span class="nn">pufferlib.registry.nmmo</span>
+<span class="kn">import</span> <span class="nn">pufferlib.environments.nmmo</span>
 
 <span class="n">envs</span> <span class="o">=</span> <span class="n">pufferlib</span><span class="o">.</span><span class="n">vectorization</span><span class="o">.</span><span class="n">Multiprocessing</span><span class="p">(</span>
-    <span class="n">env_creator</span><span class="o">=</span><span class="n">pufferlib</span><span class="o">.</span><span class="n">registry</span><span class="o">.</span><span class="n">nmmo</span><span class="o">.</span><span class="n">make_env</span><span class="p">,</span>
-    <span class="n">num_workers</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">envs_per_worker</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
+    <span class="n">env_creator</span><span class="o">=</span><span class="n">pufferlib</span><span class="o">.</span><span class="n">environments</span><span class="o">.</span><span class="n">nmmo</span><span class="o">.</span><span class="n">make_env</span><span class="p">,</span>
+    <span class="n">num_envs</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">envs_per_worker</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
 
-<span class="n">policy</span> <span class="o">=</span> <span class="n">pufferlib</span><span class="o">.</span><span class="n">registry</span><span class="o">.</span><span class="n">nmmo</span><span class="o">.</span><span class="n">Policy</span><span class="p">(</span><span class="n">envs</span><span class="o">.</span><span class="n">driver_env</span><span class="p">)</span>
-<span class="n">policy</span> <span class="o">=</span> <span class="n">pufferlib</span><span class="o">.</span><span class="n">models</span><span class="o">.</span><span class="n">RecurrentWrapper</span><span class="p">(</span><span class="n">envs</span><span class="p">,</span> <span class="n">policy</span><span class="p">,</span>
-    <span class="n">input_size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span> <span class="n">hidden_size</span><span class="o">=</span><span class="mi">256</span><span class="p">)</span>
-<span class="n">cleanrl_policy</span> <span class="o">=</span> <span class="n">pufferlib</span><span class="o">.</span><span class="n">frameworks</span><span class="o">.</span><span class="n">cleanrl</span><span class="o">.</span><span class="n">RecurrentPolicy</span><span class="p">(</span><span class="n">policy</span><span class="p">)</span>
+<span class="n">policy</span> <span class="o">=</span> <span class="n">pufferlib</span><span class="o">.</span><span class="n">environments</span><span class="o">.</span><span class="n">nmmo</span><span class="o">.</span><span class="n">Policy</span><span class="p">(</span><span class="n">envs</span><span class="o">.</span><span class="n">driver_env</span><span class="p">)</span>
+<span class="n">cleanrl_policy</span> <span class="o">=</span> <span class="n">pufferlib</span><span class="o">.</span><span class="n">frameworks</span><span class="o">.</span><span class="n">cleanrl</span><span class="o">.</span><span class="n">Policy</span><span class="p">(</span><span class="n">policy</span><span class="p">)</span>
 
-<span class="n">obs</span> <span class="o">=</span> <span class="n">envs</span><span class="o">.</span><span class="n">reset</span><span class="p">()</span>
-<span class="n">obs</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">(</span><span class="n">obs</span><span class="p">)</span>
-<span class="n">state</span> <span class="o">=</span> <span class="p">[</span><span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="mi">1</span><span class="p">,</span> <span class="mi">256</span><span class="p">,</span> <span class="mi">256</span><span class="p">)),</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="mi">1</span><span class="p">,</span> <span class="mi">256</span><span class="p">,</span> <span class="mi">256</span><span class="p">))]</span>
-<span class="n">actions</span> <span class="o">=</span> <span class="n">cleanrl_policy</span><span class="o">.</span><span class="n">get_action_and_value</span><span class="p">(</span><span class="n">obs</span><span class="p">,</span> <span class="n">state</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">numpy</span><span class="p">()</span>
-<span class="n">obs</span><span class="p">,</span> <span class="n">rewards</span><span class="p">,</span> <span class="n">dones</span><span class="p">,</span> <span class="n">infos</span> <span class="o">=</span> <span class="n">envs</span><span class="o">.</span><span class="n">step</span><span class="p">(</span><span class="n">actions</span><span class="p">)</span>
+<span class="n">env_outputs</span> <span class="o">=</span> <span class="n">envs</span><span class="o">.</span><span class="n">reset</span><span class="p">()[</span><span class="mi">0</span><span class="p">]</span>
+<span class="n">obs</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">(</span><span class="n">env_outputs</span><span class="p">)</span>
+<span class="n">actions</span> <span class="o">=</span> <span class="n">cleanrl_policy</span><span class="o">.</span><span class="n">get_action_and_value</span><span class="p">(</span><span class="n">obs</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">numpy</span><span class="p">()</span>
+<span class="n">obs</span><span class="p">,</span> <span class="n">rewards</span><span class="p">,</span> <span class="n">terminals</span><span class="p">,</span> <span class="n">truncateds</span><span class="p">,</span> <span class="n">infos</span><span class="p">,</span> <span class="n">env_id</span><span class="p">,</span> <span class="n">mask</span> <span class="o">=</span> <span class="n">envs</span><span class="o">.</span><span class="n">step</span><span class="p">(</span><span class="n">actions</span><span class="p">)</span>
 <span class="n">envs</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
 </pre></div>
 </div>
-<p>It’s that simple – almost. If you have an environment with structured observations, you’ll hvae to unpack them in the network forward pass since PufferLif will flatten them in emulation. We provide a utility for this – just be sure to save a reference to your environment inside of the model so you have access to the observation space.</p>
-<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">env_outputs</span> <span class="o">=</span> <span class="n">pufferlib</span><span class="o">.</span><span class="n">emulation</span><span class="o">.</span><span class="n">unpack_batched_obs</span><span class="p">(</span>
-    <span class="n">env_outputs</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">envs</span><span class="o">.</span><span class="n">flat_observation_space</span>
+<p>It’s that simple – almost. If you have an environment with structured observations, you’ll have to unpack them in the network forward pass since PufferLib will flatten them in emulation. We provide a utility for this.</p>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">obs</span> <span class="o">=</span> <span class="n">pufferlib</span><span class="o">.</span><span class="n">emulation</span><span class="o">.</span><span class="n">unpack_batched_obs</span><span class="p">(</span>
+    <span class="n">env_outputs</span><span class="p">,</span>
+    <span class="n">envs</span><span class="o">.</span><span class="n">driver_env</span><span class="o">.</span><span class="n">flat_observation_space</span><span class="p">,</span>
+    <span class="n">envs</span><span class="o">.</span><span class="n">driver_env</span><span class="o">.</span><span class="n">flat_observation_structure</span>
 <span class="p">)</span>
 </pre></div>
 </div>
-<p>That’s all you need to get started. The PufferLib repository contains full-length CleanRL scripts with PufferLib integration.</p>
+<p>That’s all you need to get started. The PufferLib repository contains full-length CleanRL scripts with PufferLib integration. SB3 and other integrations coming soon!</p>
 <section id="libraries">
 <h1>Libraries<a class="headerlink" href="#libraries" title="Permalink to this heading">#</a></h1>
 <p>PufferLib’s emulation layer adheres to the Gym and PettingZoo APIs: you can use it with <em>any</em> environment and learning library (subject to Limitations). The libraries and environments below are just the ones we’ve tested. We also provide additional tools to make them easier to work with.</p>
@@ -467,7 +472,7 @@ <h1>Libraries<a class="headerlink" href="#libraries" title="Permalink to this he
 </div>
 <a class="sd-stretched-link reference external" href="https://colab.research.google.com/drive/1OMcaJnCAF1UiCJxKIxSS-RdZTuonItYT?usp=sharing"></a></div>
 <p>Or view it on GitHub <a class="reference external" href="https://github.com/PufferAI/PufferLib/blob/experimental/cleanrl_ppo_atari.py">here</a></p>
-<p>We are also working on a heavily customized version of CleanRL PPO with support for recurrent and non-recurrent models, async environment execution, variable agent populations, self-play, and experiment management. This is the version we use for our research and the NeurIPS 2023 Neural MMO Competition. It’s still under development, but you can try it out <a class="reference external" href="https://github.com/PufferAI/PufferLib/blob/experimental/clean_pufferl.py">here</a></p>
+<p>PufferLib also includes a heavily customized version of CleanRL PPO with support for recurrent and non-recurrent models, async environment execution, variable agent populations, self-play, and experiment management. This is the version we use for our research and the NeurIPS 2023 Neural MMO Competition. You can try it out <a class="reference external" href="https://github.com/PufferAI/PufferLib/blob/experimental/clean_pufferl.py">here</a></p>
 <div style="display: flex; align-items: center; margin-bottom: 15px;">
     <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
         <a href="https://github.com/anyscale/ray" target="_blank">
@@ -477,11 +482,11 @@ <h1>Libraries<a class="headerlink" href="#libraries" title="Permalink to this he
     <div>
         <p><a href="https://docs.ray.io/">Ray</a> is a general purpose distributed computing framework that includes <a href="https://docs.ray.io/en/latest/rllib">RLlib</a>, an industry reinforcement learning library.</p>
     </div>
-</div><p>While RLlib is great on paper, there are currently a few issues. The pre-gymnasium 2.0 release is very buggy and has next to no error checking on the user API. The latest version may be more stable, but it pins a very recent version of Gymnasium that breaks compatiblity with many environments. We have a simple running script <a class="reference external" href="https://github.com/PufferAI/PufferLib/blob/experimental/rllib_ppo.py">here</a> that works with 2.0 for now. We will update this when the situation improves.</p>
+</div><p>We have previously supported RLLib and may again in the future. RLlib has not received updates in a while, and the current release is very buggy. We will update this if the situation improves.</p>
 </section>
 <section id="environments">
 <h1>Environments<a class="headerlink" href="#environments" title="Permalink to this heading">#</a></h1>
-<p>We also provide a registry of environments and models that are supported out of the box. These environments are already set up for you in PufferTank and are used in our test cases to ensure they work with PufferLib. Several also include reasonable baseline policies. Join our Discord if you would like to add setup and tests for new environments or improvements to any of the baselines.</p>
+<p>We also provide integrations for many environments out of the box. Non-pip dependencies are already set up for you in PufferTank. Several environments also include reasonable baseline policies. Join our Discord if you would like to add setup and tests for new environments or improvements to any of the baselines.</p>
 <div style="display: flex; align-items: center; margin-bottom: 15px;">
     <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
         <a href="https://github.com/openai/gym" target="_blank">
@@ -495,12 +500,12 @@ <h1>Environments<a class="headerlink" href="#environments" title="Permalink to t
 
 <div style="display: flex; align-items: center; margin-bottom: 15px;">
     <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
-        <a href="https://github.com/Farama-Foundation/Arcade-Learning-Environment" target="_blank">
-            <img src="https://img.shields.io/github/stars/Farama-Foundation/Arcade-Learning-Environment?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star Arcade Learning Environment" width="100px">
+        <a href="https://github.com/PWhiddy/PokemonRedExperiments" target="_blank">
+            <img src="https://img.shields.io/github/stars/PWhiddy/PokemonRedExperiments?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star Pokemon Red" width="100px">
         </a>
     </div>
     <div>
-        <p><a href="https://github.com/Farama-Foundation/Arcade-Learning-Environment">Arcade Learning Environment</a> provides a Gym interface for classic Atari games. This is the most popular benchmark for reinforcement learning algorithms.</p>
+        <p><a href="https://github.com/PWhiddy/PokemonRedExperiments">Pokemon Red</a> is one of the original Pokemon games for gameboy. This project uses the game as an environment for reinforcement learning. We are actively supporting development on this one!</p>
     </div>
 </div>
 
@@ -517,12 +522,23 @@ <h1>Environments<a class="headerlink" href="#environments" title="Permalink to t
 
 <div style="display: flex; align-items: center; margin-bottom: 15px;">
     <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
-        <a href="https://github.com/neuralmmo/environment" target="_blank">
-            <img src="https://img.shields.io/github/stars/openai/neural-mmo?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star Neural MMO" width="100px">
+        <a href="https://github.com/Farama-Foundation/Arcade-Learning-Environment" target="_blank">
+            <img src="https://img.shields.io/github/stars/Farama-Foundation/Arcade-Learning-Environment?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star Arcade Learning Environment" width="100px">
         </a>
     </div>
     <div>
-        <p><a href="https://neuralmmo.github.io">Neural MMO</a> is a massively multiagent environment for reinforcement learning. It combines large agent populations with high per-agent complexity and is the most actively maintained (by me) project on this list.</p>
+        <p><a href="https://github.com/Farama-Foundation/Arcade-Learning-Environment">Arcade Learning Environment</a> provides a Gym interface for classic Atari games. This is the most popular benchmark for reinforcement learning algorithms.</p>
+    </div>
+</div>
+
+<div style="display: flex; align-items: center; margin-bottom: 15px;">
+    <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
+        <a href="https://github.com/Farama-Foundation/Minigrid" target="_blank">
+            <img src="https://img.shields.io/github/stars/Farama-Foundation/Minigrid?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star Minigrid" width="100px">
+        </a>
+    </div>
+    <div>
+        <p><a href="https://github.com/Farama-Foundation/Minigrid">Minigrid</a> is a 2D grid-world environment engine and a collection of builtin environments. The target is flexible and computationally efficient RL research.</p>
     </div>
 </div>
 
@@ -537,6 +553,17 @@ <h1>Environments<a class="headerlink" href="#environments" title="Permalink to t
     </div>
 </div>
 
+<div style="display: flex; align-items: center; margin-bottom: 15px;">
+    <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
+        <a href="https://github.com/neuralmmo/environment" target="_blank">
+            <img src="https://img.shields.io/github/stars/openai/neural-mmo?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star Neural MMO" width="100px">
+        </a>
+    </div>
+    <div>
+        <p><a href="https://neuralmmo.github.io">Neural MMO</a> is a massively multiagent environment for reinforcement learning. It combines large agent populations with high per-agent complexity and is the most actively maintained (by me) project on this list.</p>
+    </div>
+</div>
+
 <div style="display: flex; align-items: center; margin-bottom: 15px;">
     <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
         <a href="https://github.com/openai/procgen" target="_blank">
@@ -559,6 +586,17 @@ <h1>Environments<a class="headerlink" href="#environments" title="Permalink to t
     </div>
 </div>
 
+<div style="display: flex; align-items: center; margin-bottom: 15px;">
+    <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
+        <a href="https://github.com/facebookresearch/minihack" target="_blank">
+            <img src="https://img.shields.io/github/stars/facebookresearch/minihack?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star MiniHack" width="100px">
+        </a>
+    </div>
+    <div>
+        <p><a href="https://github.com/facebookresearch/nle">MiniHack Learning Environment</a> is a stripped down version of NetHack with support for level editing and custom procedural generation.</p>
+    </div>
+</div>
+
 <div style="display: flex; align-items: center; margin-bottom: 15px;">
     <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
         <a href="https://github.com/danijar/crafter" target="_blank">
@@ -595,13 +633,12 @@ <h1>Environments<a class="headerlink" href="#environments" title="Permalink to t
 <h1>Current Limitations<a class="headerlink" href="#current-limitations" title="Permalink to this heading">#</a></h1>
 <ul class="simple">
 <li><p>No continuous action spaces (WIP)</p></li>
-<li><p>Pre-gymnasium Gym and PettingZoo only (WIP)</p></li>
 <li><p>Support for heterogenous observations and actions requires you to specify teams such that each team has the same observation and action space. There’s no good way around this.</p></li>
 </ul>
 </section>
 <section id="license">
 <h1>License<a class="headerlink" href="#license" title="Permalink to this heading">#</a></h1>
-<p>PufferLib is free and open-source software under the MIT license. This is the full set of tools maintained by PufferAI; we do not have private repositories with additional utilities.</p>
+<p>PufferLib is free and open-source software under the MIT license. This is the full set of tools maintained by PufferAI. Dev branches are public and we do not have private repositories with additional utilities.</p>
 </section>
 
         </article>
diff --git a/docs/build/html/search.html b/docs/build/html/search.html
index 3a6aae3..1b40b0e 100644
--- a/docs/build/html/search.html
+++ b/docs/build/html/search.html
@@ -4,7 +4,7 @@
     <meta name="viewport" content="width=device-width,initial-scale=1"/>
     <meta name="color-scheme" content="light dark"><link rel="index" title="Index" href="genindex.html" /><link rel="search" title="Search" href="#" />
 
-    <!-- Generated with Sphinx 5.0.0 and Furo 2023.03.27 --><title>Search - PufferLib 0.4.3 documentation</title><link rel="stylesheet" type="text/css" href="_static/pygments.css" />
+    <!-- Generated with Sphinx 5.0.0 and Furo 2023.03.27 --><title>Search - PufferLib 0.5.0 documentation</title><link rel="stylesheet" type="text/css" href="_static/pygments.css" />
     <link rel="stylesheet" type="text/css" href="_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
     <link rel="stylesheet" type="text/css" href="_static/design-style.1e8bd061cd6da7fc9cf755528e8ffc24.min.css" />
     <link rel="stylesheet" type="text/css" href="_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
@@ -187,7 +187,7 @@
       </label>
     </div>
     <div class="header-center">
-      <a href="index.html"><div class="brand">PufferLib 0.4.3 documentation</div></a>
+      <a href="index.html"><div class="brand">PufferLib 0.5.0 documentation</div></a>
     </div>
     <div class="header-right">
       <div class="theme-toggle-container theme-toggle-header">
@@ -210,7 +210,7 @@
       <div class="sidebar-sticky"><a class="sidebar-brand" href="index.html">
   
   
-  <span class="sidebar-brand-text">PufferLib 0.4.3 documentation</span>
+  <span class="sidebar-brand-text">PufferLib 0.5.0 documentation</span>
   
 </a><form class="sidebar-search-container" method="get" action="#" role="search">
   <input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
@@ -228,15 +228,17 @@
 <p class="caption" role="heading"><span class="caption-text">API</span></p>
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="rst/api.html">Emulation</a></li>
-<li class="toctree-l1"><a class="reference internal" href="rst/api.html#registry">Registry</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rst/api.html#environments">Environments</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/api.html#models">Models</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/api.html#vectorization">Vectorization</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/api.html#cleanrl-integration">CleanRL Integration</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/api.html#rllib-binding">RLlib Binding</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rst/api.html#sb3-binding">SB3 Binding</a></li>
 </ul>
 <p class="caption" role="heading"><span class="caption-text">Blog</span></p>
 <ul>
-<li class="toctree-l1"><a class="reference internal" href="rst/blog.html">PufferLib 0.4: Ready to Take on Bigger Fish</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rst/blog.html">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rst/blog.html#pufferlib-0-4-ready-to-take-on-bigger-fish">PufferLib 0.4: Ready to Take on Bigger Fish</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/blog.html#pufferlib-0-2-ready-to-take-on-the-big-fish">PufferLib 0.2: Ready to Take on the Big Fish</a></li>
 </ul>
 
diff --git a/docs/build/html/searchindex.js b/docs/build/html/searchindex.js
index 5a2c8be..3c89dff 100644
--- a/docs/build/html/searchindex.js
+++ b/docs/build/html/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"docnames": ["index", "rst/api", "rst/blog", "rst/landing"], "filenames": ["index.rst", "rst/api.rst", "rst/blog.rst", "rst/landing.rst"], "titles": ["Index", "Emulation", "PufferLib 0.4: Ready to Take on Bigger Fish", "Libraries"], "terms": {"librari": [0, 1, 2], "environ": [0, 1, 2], "current": [0, 1, 2], "limit": [0, 2], "licens": 0, "emul": [0, 3], "registri": [0, 3], "atari": [0, 2, 3], "butterfli": [0, 3], "classic": [0, 3], "control": 0, "crafter": [0, 3], "griddli": [0, 3], "magent": [0, 3], "micrort": [0, 3], "nethack": [0, 3], "neural": [0, 2, 3], "mmo": [0, 2, 3], "procgen": [0, 3], "model": [0, 2, 3], "vector": [0, 3], "cleanrl": [0, 3], "integr": [0, 2, 3], "rllib": [0, 2, 3], "bind": [0, 2], "pufferlib": [0, 1, 3], "0": [0, 1, 3], "4": [0, 1], "readi": 0, "take": [0, 1, 3], "bigger": 0, "fish": 0, "puffertank": [0, 3], "polici": [0, 1, 3], "error": [0, 1, 3], "handl": [0, 1, 3], "miscellan": 0, "2": [0, 1, 3], "big": 0, "problem": 0, "statement": 0, "demo": [0, 3], "next": [0, 3], "step": [0, 1, 3], "our": [1, 2, 3], "public": [1, 2], "api": [1, 2, 3], "advanc": 1, "user": [1, 3], "can": [1, 2, 3], "check": [1, 2, 3], "sourc": [1, 2, 3], "addit": [1, 2, 3], "util": [1, 2, 3], "note": [1, 3], "we": [1, 2, 3], "tend": 1, "move": 1, "around": [1, 3], "more": [1, 2, 3], "often": 1, "contribut": [1, 2], "welcom": 1, "wrap": [1, 2], "your": [1, 2, 3], "broad": 1, "compat": [1, 2], "support": [1, 2, 3], "pass": [1, 2, 3], "creator": [1, 3], "function": [1, 2, 3], "class": [1, 2, 3], "env": [1, 2, 3], "object": [1, 2, 3], "The": [1, 2, 3], "return": [1, 2, 3], "pufferenv": [1, 2], "same": [1, 2, 3], "gym": [1, 2, 3], "pettingzoo": [1, 2, 3], "gympufferenv": [1, 2, 3], "none": 1, "env_creat": [1, 2, 3], "env_arg": 1, "env_kwarg": 1, "postprocessor_cl": 1, "postprocessor": 1, "properti": 1, "observation_spac": 1, "flatten": [1, 2, 3], "singl": [1, 2, 3], "tensor": [1, 2, 3], "observ": [1, 2, 3], "space": [1, 2, 3], "action_spac": 1, "multi": [1, 3], "discret": [1, 2], "action": [1, 2, 3], "seed": 1, "set": [1, 2, 3], "thi": [1, 2, 3], "s": [1, 2, 3], "random": 1, "number": [1, 2, 3], "gener": [1, 2, 3], "some": [1, 2, 3], "us": [1, 2, 3], "multipl": [1, 2, 3], "pseudorandom": 1, "want": 1, "captur": 1, "all": [1, 2, 3], "order": [1, 2], "ensur": [1, 2, 3], "aren": 1, "t": [1, 3], "accident": 1, "correl": 1, "between": [1, 2, 3], "list": [1, 2, 3], "first": 1, "valu": [1, 3], "should": 1, "main": 1, "which": [1, 2, 3], "reproduc": 1, "equal": 1, "provid": [1, 2, 3], "won": 1, "true": [1, 3], "exampl": [1, 2, 3], "type": 1, "bigint": 1, "reset": [1, 2, 3], "an": [1, 2, 3], "initi": [1, 2], "state": [1, 2, 3], "variabl": [1, 2, 3], "sampl": 1, "independ": 1, "call": [1, 2], "In": [1, 2], "other": [1, 2, 3], "word": 1, "each": [1, 3], "yield": 1, "suitabl": [1, 2], "new": [1, 2, 3], "episod": 1, "previou": [1, 2], "execut": [1, 3], "reward": [1, 2, 3], "done": [1, 3], "info": [1, 2, 3], "close": [1, 3], "overrid": 1, "subclass": [1, 2], "perform": [1, 2], "ani": [1, 2, 3], "necessari": 1, "cleanup": 1, "automat": 1, "themselv": 1, "when": [1, 3], "garbag": 1, "collect": 1, "program": 1, "exit": 1, "unpack_batched_ob": [1, 3], "batched_ob": 1, "pettingzoopufferenv": [1, 2, 3], "postprocessor_kwarg": 1, "team": [1, 2, 3], "agent": [1, 2, 3], "single_observation_spac": [1, 3], "single_action_spac": [1, 3], "make_env": [1, 3], "includ": [1, 2, 3], "noopresetenv": 1, "noop_max": 1, "int": 1, "30": 1, "op": 1, "No": [1, 3], "assum": 1, "paramet": [1, 2], "maximum": 1, "run": [1, 2, 3], "kwarg": 1, "ndarrai": 1, "atarifeatur": 1, "is_multiag": 1, "agent_id": 1, "full": [1, 3], "access": [1, 2, 3], "mean": [1, 2], "you": [1, 2, 3], "them": [1, 2, 3], "cheat": 1, "don": [1, 3], "blame": 1, "do": [1, 2, 3], "ob": [1, 2, 3], "begin": 1, "after": 1, "must": 1, "chang": [1, 2, 3], "structur": [1, 2, 3], "reward_done_info": 1, "thei": [1, 2, 3], "ar": [1, 2, 3], "name": [1, 2], "framestack": 1, "creation": [1, 2], "default": [1, 2, 3], "preprocess": 1, "base": [1, 2, 3], "stabl": [1, 2, 3], "baselines3": 1, "wrapper": [1, 2, 3], "arg": 1, "stack": [1, 2], "three": 1, "convolut": 1, "follow": [1, 3], "linear": [1, 3], "layer": [1, 2, 3], "mandatori": 1, "keyword": 1, "argument": [1, 2, 3], "suggest": [1, 2, 3], "1": [1, 2, 3], "frame": 1, "lstm": 1, "without": [1, 2, 3], "make_knights_archers_zombies_v10": 1, "knight": 1, "archer": 1, "zombi": 1, "Not": 1, "yet": 1, "requir": [1, 2, 3], "heterogen": [1, 3], "make_cooperative_pong_v5": 1, "cooper": 1, "pong": 1, "classic_control": 1, "make_cartpole_env": 1, "cartpol": 1, "test": [1, 2, 3], "becaus": 1, "work": [1, 2, 3], "depend": [1, 2, 3], "crafterpostprocessor": 1, "featur": [1, 2], "griddlygympufferenv": 1, "need": [1, 2, 3], "defin": 1, "make_spider_v0_env": 1, "spider": 1, "wip": [1, 3], "specifi": [1, 2, 3], "until": 1, "creat": [1, 2], "off": 1, "dqn": 1, "critic": 1, "hidden": [1, 3], "encode_observ": [1, 3], "encod": [1, 3], "batch": [1, 2], "start": [1, 3], "unflatten": [1, 2], "origin": [1, 3], "form": 1, "self": [1, 2, 3], "structured_observation_spac": 1, "env_output": [1, 3], "flat_observ": 1, "A": 1, "shape": [1, 2, 3], "obs_siz": 1, "hidden_s": [1, 3], "lookup": 1, "embed": 1, "decode_act": [1, 3], "decod": [1, 3], "multidiscret": [1, 2], "flat_hidden": 1, "action_s": 1, "concaten": 1, "logit": 1, "dimens": 1, "It": [1, 3], "sum": 1, "nvec": [1, 3], "make_battle_v4_env": 1, "battl": 1, "appear": [1, 2, 3], "broken": 1, "crash": 1, "java": [1, 3], "learn": [1, 2, 3], "port": [1, 3], "from": [1, 2, 3], "nle": [1, 2, 3], "releas": [1, 2, 3], "concat": 1, "crop": 1, "height": 1, "width": 1, "height_target": 1, "width_target": 1, "helper": 1, "nethacknet": 1, "below": [1, 3], "intern": [1, 2], "modul": [1, 2, 3], "share": [1, 3], "both": [1, 2], "nn": [1, 3], "scriptmodul": 1, "forward": [1, 3], "input": 1, "coordin": 1, "calcul": 1, "center": 1, "given": 1, "x": 1, "y": 1, "b": 1, "h": 1, "w": 1, "train": [1, 2, 3], "bool": 1, "nmmo": [1, 2, 3], "residualblock": 1, "channel": 1, "comput": [1, 3], "everi": [1, 3], "overridden": 1, "although": 1, "recip": 1, "within": 1, "one": [1, 2, 3], "instanc": 1, "afterward": 1, "instead": [1, 2], "sinc": [1, 2, 3], "former": 1, "care": [1, 3], "regist": 1, "hook": [1, 2], "while": [1, 2, 3], "latter": 1, "silent": 1, "ignor": 1, "convsequ": 1, "input_shap": 1, "out_channel": 1, "get_output_shap": 1, "procgenvecenv": 1, "env_nam": [1, 2], "num_env": 1, "num_level": 1, "start_level": 1, "distribution_mod": 1, "easi": [1, 2], "doe": [1, 2, 3], "normal": 1, "procgenpostprocessor": 1, "pure": 1, "pytorch": [1, 2, 3], "spec": 1, "allow": [1, 2, 3], "repackag": 1, "rl": [1, 2, 3], "framework": [1, 2, 3], "equival": 1, "flexibl": 1, "To": [1, 2, 3], "simpli": [1, 2], "put": 1, "everyth": 1, "befor": [1, 2], "recurr": [1, 2, 3], "cell": 1, "head": 1, "delet": 1, "its": 1, "specif": [1, 2, 3], "treat": 1, "tempor": 1, "data": 1, "bit": [1, 3], "differ": [1, 2, 3], "approach": [1, 2, 3], "let": [1, 2, 3], "write": [1, 3], "network": [1, 3], "element": 1, "output": [1, 2], "abstract": 1, "recurrentwrapp": [1, 3], "meant": 1, "debug": [1, 2, 3], "unlik": 1, "anyth": [1, 3], "relu": 1, "distribut": [1, 3], "backend": [1, 2, 3], "serial": [1, 2, 3], "num_work": [1, 2, 3], "envs_per_work": [1, 2, 3], "process": 1, "get": [1, 2, 3], "multiprocess": [1, 2, 3], "parallel": 1, "most": [1, 2, 3], "applic": 1, "rai": [1, 2, 3], "simul": [1, 2, 3], "cluster": 1, "also": [1, 2, 3], "faster": [1, 2], "than": [1, 2], "machin": [1, 2, 3], "non": [1, 3], "get_valu": 1, "get_action_and_valu": [1, 3], "see": [1, 3], "recurrentpolici": [1, 3], "under": [1, 3], "construct": 1, "focu": [1, 2], "v0": 1, "still": [1, 2, 3], "via": 1, "discord": [1, 2, 3], "offici": 1, "register_env": 1, "read_checkpoint": 1, "tune_path": 1, "create_polici": 1, "n": [1, 3], "make_polici": 1, "policy_cl": 1, "lstm_layer": 1, "implement": [1, 2, 3], "If": [1, 3], "rlpredictor": 1, "preprocessor": 1, "option": [1, 2, 3], "subclasses": 1, "predictor": 1, "__init__": [1, 3], "predict": 1, "infer": 1, "databatchtyp": 1, "These": [1, 2, 3], "_predict_panda": 1, "directli": [1, 2], "result": [1, 2], "callback": 1, "legacy_callbacks_dict": 1, "dict": 1, "str": 1, "callabl": 1, "on_train_result": 1, "algorithm": [1, 2, 3], "trainer": 1, "epoch": 1, "level": [1, 3], "on_episode_end": 1, "worker": [1, 2], "base_env": 1, "refer": [1, 3], "rollout": 1, "baseenv": 1, "underli": [1, 3], "sub": 1, "retriev": 1, "get_sub_environ": 1, "map": [1, 2], "id": 1, "mode": 1, "onli": [1, 2, 3], "default_polici": 1, "contain": [1, 2, 3], "user_data": 1, "store": [1, 3], "temporari": 1, "custom_metr": 1, "custom": [1, 2, 3], "metric": 1, "case": [1, 2, 3], "failur": 1, "mai": [1, 3], "except": 1, "thrown": 1, "finish": 1, "properli": 1, "logic": 1, "placehold": 1, "browser": [2, 3], "video": [2, 3], "out": [2, 3], "now": [2, 3], "make": [2, 3], "plai": [2, 3], "nice": [2, 3], "line": [2, 3], "pain": 2, "free": [2, 3], "click": [2, 3], "colab": [2, 3], "One": 2, "preload": 2, "common": 2, "importantli": 2, "have": [2, 3], "rewritten": 2, "entir": 2, "core": 2, "simplic": 2, "extens": 2, "flashi": 2, "notic": 2, "significantli": 2, "fewer": 2, "rough": 2, "edg": 2, "For": [2, 3], "longer": 2, "convert": [2, 3], "wysiwyg": 2, "previous": 2, "back": 2, "benefit": 2, "describ": 2, "blog": 2, "post": 2, "import": [2, 3], "def": [2, 3], "nmmo_creat": [2, 3], "nethack_cr": [2, 3], "expect": 2, "abov": [2, 3], "prefer": 2, "compar": 2, "vec": [2, 3], "Or": [2, 3], "synchron": [2, 3], "async": [2, 3], "async_reset": [2, 3], "_": [2, 3], "recv": [2, 3], "mani": [2, 3], "notori": [2, 3], "hard": 2, "up": [2, 3], "sever": [2, 3], "popular": [2, 3], "onto": 2, "imag": 2, "so": [2, 3], "build": [2, 3], "over": [2, 3], "coffe": 2, "break": [2, 3], "vanilla": [2, 3], "anoth": 2, "cleanrl_polici": [2, 3], "appli": 2, "expens": 2, "runtim": 2, "could": 2, "disabl": 2, "o": 2, "wa": [2, 3], "inconveni": 2, "easili": 2, "forgotten": 2, "onc": 2, "startup": 2, "neglig": 2, "overhead": 2, "thu": 2, "far": [2, 3], "bug": 2, "version": [2, 3], "would": [2, 3], "been": 2, "caught": 2, "ad": 2, "sane": 2, "instal": [2, 3], "setup": [2, 3], "home": 2, "page": [2, 3], "updat": [2, 3], "save": [2, 3], "written": 2, "optim": [2, 3], "bottleneck": 2, "complex": [2, 3], "separ": 2, "interest": 2, "studi": 2, "python": 2, "experiment": 2, "deriv": 2, "correctli": 2, "pad": 2, "ha": [2, 3], "longstand": 2, "challeng": 2, "join": [2, 3], "tell": 2, "point": 2, "might": 2, "just": [2, 3], "fix": 2, "goal": 2, "reinforc": [2, 3], "game": [2, 3], "simpl": [2, 3], "preliminari": 2, "re": [2, 3], "excit": 2, "announc": 2, "dozen": 2, "better": 2, "streamlin": [2, 3], "understand": 2, "consid": 2, "determinist": 2, "fulli": [2, 3], "rel": [2, 3], "short": [2, 3], "time": [2, 3], "horizon": [2, 3], "contrast": 2, "nondeterminist": 2, "partial": 2, "larg": [2, 3], "popul": [2, 3], "hierarch": 2, "design": [2, 3], "mind": 2, "bia": 2, "toward": 2, "small": 2, "10": 2, "million": 2, "research": [2, 3], "tackl": 2, "lead": 2, "exclus": 2, "ran": 2, "file": [2, 3], "proxim": 2, "ppo": [2, 3], "replac": 2, "code": [2, 3], "eas": 2, "log": 2, "latest": [2, 3], "doubl": 2, "buffer": 2, "asynchron": [2, 3], "samplefactori": 2, "paper": [2, 3], "accuraci": 2, "maintain": [2, 3], "wandb": 2, "profil": 2, "baselin": [2, 3], "correct": 2, "kei": 2, "idea": 2, "behind": 2, "therebi": 2, "like": [2, 3], "perspect": 2, "nativ": 2, "here": [2, 3], "size": 2, "conform": 2, "lose": 2, "inform": 2, "right": 2, "constant": [2, 3], "sort": 2, "final": 2, "subtleti": 2, "multiag": [2, 3], "termin": 2, "signal": 2, "too": 2, "straightforward": 2, "env_cl": 2, "accept": 2, "certain": 2, "well": [2, 3], "abil": 2, "suppress": 2, "avoid": 2, "excess": 2, "across": 2, "split": [2, 3], "few": [2, 3], "technic": 2, "prone": 2, "difficult": [2, 3], "finicki": [2, 3], "costli": 2, "cpu": 2, "per": [2, 3], "vecenv": 2, "rayvecenv": 2, "num_cor": 2, "adher": [2, 3], "perfectli": 2, "quickli": 2, "becom": 2, "cumbersom": 2, "outsid": 2, "ideal": 2, "scale": [2, 3], "beyond": [2, 3], "eat": 2, "trace": 2, "direct": 2, "remot": 2, "arbitrari": 2, "individu": 2, "At": 2, "shorter": 2, "simpler": 2, "convei": 2, "task": 2, "receiv": 2, "field": 2, "cover": 2, "subsequ": 2, "detail": [2, 3], "major": 2, "downsid": 2, "particularli": 2, "fast": 2, "itself": 2, "cap": 2, "hundr": 2, "thousand": 2, "second": 2, "price": 2, "paid": 2, "larger": 2, "emploi": 2, "techniqu": 2, "help": 2, "mitig": 2, "issu": [2, 3], "ultim": 2, "continu": [2, 3], "solv": 2, "repres": 2, "part": 2, "what": [2, 3], "tool": [2, 3], "plan": 2, "futur": 2, "develop": [2, 3], "add": [2, 3], "passthrough": 2, "There": [2, 3], "room": 2, "area": 2, "aim": 2, "commonli": 2, "method": 2, "histor": 2, "multiplay": 2, "skill": 2, "rate": 2, "curriculum": 2, "focus": 2, "mechan": 2, "earli": 2, "stage": 2, "howev": 2, "rapid": 2, "progress": 2, "gymnasium": [2, 3], "conflict": 2, "old": 2, "happen": 2, "slowli": 2, "increas": 2, "coverag": 2, "joseph": [2, 3], "suarez": [2, 3], "thank": 2, "ryan": 2, "sullivan": 2, "feedback": 2, "togeth": 3, "commun": 3, "discuss": 3, "my": 3, "twitter": 3, "star": 3, "repo": 3, "feed": 3, "puffer": 3, "whitepap": 3, "neurip": 3, "2023": 3, "alo": 3, "workshop": 3, "come": 3, "sai": 3, "hi": 3, "gpu": 3, "slow": 3, "tricki": 3, "clone": 3, "repositori": 3, "open": 3, "vscode": 3, "dev": 3, "plugin": 3, "docker": 3, "desktop": 3, "detect": 3, "devcontain": 3, "pip": 3, "avail": 3, "standard": 3, "packag": 3, "contributor": 3, "david": 3, "bloomin": 3, "pool": 3, "selector": 3, "nick": 3, "jenkin": 3, "layout": 3, "system": 3, "architectur": 3, "diagram": 3, "adversari": 3, "andranik": 3, "tigranyan": 3, "anim": 3, "pufferfish": 3, "hire": 3, "him": 3, "upwork": 3, "sara": 3, "earl": 3, "her": 3, "guid": 3, "notebook": 3, "button": 3, "top": 3, "heirarch": 3, "quirk": 3, "incompat": 3, "look": 3, "flat": 3, "how": 3, "two": 3, "enabl": 3, "otherwis": 3, "choos": 3, "varieti": 3, "interfac": 3, "sync": 3, "els": 3, "good": 3, "torch": 3, "numpi": 3, "np": 3, "super": 3, "prod": 3, "128": 3, "modulelist": 3, "value_head": 3, "reshap": 3, "dec": 3, "driver_env": 3, "lightweight": 3, "advantag": 3, "reli": 3, "conveni": 3, "complet": 3, "input_s": 3, "256": 3, "zero": 3, "almost": 3, "ll": 3, "hvae": 3, "unpack": 3, "pufferlif": 3, "sure": 3, "insid": 3, "flat_observation_spac": 3, "That": 3, "length": 3, "script": 3, "subject": 3, "ones": 3, "ve": 3, "easier": 3, "addition": 3, "portion": 3, "suit": 3, "80": 3, "academ": 3, "about": 3, "view": 3, "github": 3, "heavili": 3, "experi": 3, "manag": 3, "competit": 3, "try": 3, "purpos": 3, "industri": 3, "great": 3, "pre": 3, "veri": 3, "buggi": 3, "pin": 3, "recent": 3, "compatibl": 3, "situat": 3, "improv": 3, "box": 3, "alreadi": 3, "reason": 3, "openai": 3, "built": 3, "box2d": 3, "arcad": 3, "benchmark": 3, "massiv": 3, "combin": 3, "high": 3, "activ": 3, "me": 3, "project": 3, "platform": 3, "procedur": 3, "computation": 3, "effici": 3, "extrem": 3, "down": 3, "2d": 3, "minecraft": 3, "pixel": 3, "long": 3, "real": 3, "strategi": 3, "engin": 3, "configur": 3, "wai": 3, "softwar": 3, "mit": 3, "pufferai": 3, "privat": 3}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"index": 0, "user": 0, "guid": 0, "api": 0, "blog": 0, "emul": [1, 2], "registri": 1, "atari": 1, "butterfli": 1, "classic": 1, "control": 1, "crafter": 1, "griddli": 1, "magent": 1, "micrort": 1, "nethack": 1, "neural": 1, "mmo": 1, "procgen": 1, "model": 1, "vector": [1, 2], "cleanrl": [1, 2], "integr": 1, "rllib": 1, "bind": 1, "pufferlib": 2, "0": 2, "4": 2, "readi": 2, "take": 2, "bigger": 2, "fish": 2, "puffertank": 2, "polici": 2, "error": 2, "handl": 2, "miscellan": 2, "2": 2, "big": 2, "problem": 2, "statement": 2, "demo": 2, "next": 2, "step": 2, "librari": 3, "environ": 3, "current": 3, "limit": 3, "licens": 3}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 6, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 56}})
\ No newline at end of file
+Search.setIndex({"docnames": ["index", "rst/api", "rst/blog", "rst/landing"], "filenames": ["index.rst", "rst/api.rst", "rst/blog.rst", "rst/landing.rst"], "titles": ["Index", "Emulation", "PufferLib 0.5: A Bigger EnvPool for Growing Puffers", "Libraries"], "terms": {"librari": [0, 2], "environ": [0, 2], "current": [0, 1, 2], "limit": [0, 2], "licens": 0, "emul": [0, 3], "model": [0, 2, 3], "vector": [0, 3], "cleanrl": [0, 3], "integr": [0, 2, 3], "rllib": [0, 2, 3], "bind": [0, 2], "sb3": [0, 3], "pufferlib": [0, 1, 3], "0": [0, 3], "5": 0, "A": [0, 1], "bigger": 0, "envpool": [0, 3], "grow": 0, "puffer": [0, 1, 3], "The": [0, 1, 3], "simul": [0, 1, 3], "crisi": 0, "solut": 0, "experi": [0, 3], "technic": 0, "detail": [0, 3], "gotcha": 0, "4": [0, 1, 3], "readi": 0, "take": [0, 1, 3], "fish": 0, "puffertank": [0, 3], "polici": [0, 1, 3], "error": 0, "handl": [0, 1, 3], "miscellan": 0, "2": [0, 3], "big": 0, "problem": 0, "statement": 0, "demo": [0, 3], "next": 0, "step": [0, 1, 3], "our": [1, 2, 3], "public": [1, 2, 3], "api": [1, 2, 3], "advanc": 1, "user": 1, "can": [1, 2, 3], "check": [1, 2], "sourc": [1, 2, 3], "addit": [1, 2, 3], "util": [1, 2, 3], "note": [1, 3], "we": [1, 2, 3], "tend": 1, "move": 1, "around": [1, 2, 3], "more": [1, 2], "often": 1, "contribut": [1, 2], "welcom": 1, "wrap": [1, 2, 3], "your": [1, 2, 3], "broad": 1, "compat": [1, 2], "support": [1, 2, 3], "pass": [1, 2, 3], "creator": [1, 3], "function": [1, 2, 3], "class": [1, 2, 3], "env": [1, 2, 3], "object": [1, 2, 3], "return": [1, 2, 3], "pufferenv": [1, 2], "same": [1, 2, 3], "gym": [1, 2, 3], "pettingzoo": [1, 2, 3], "gymnasiumpufferenv": [1, 3], "none": 1, "env_creat": [1, 2, 3], "env_arg": 1, "env_kwarg": 1, "postprocessor_cl": 1, "basicpostprocessor": 1, "properti": 1, "observation_spac": 1, "flatten": [1, 2, 3], "singl": [1, 2, 3], "tensor": [1, 2, 3], "observ": [1, 2, 3], "space": [1, 2, 3], "action_spac": 1, "multi": [1, 3], "discret": [1, 2], "action": [1, 2, 3], "seed": 1, "set": [1, 2, 3], "thi": [1, 2, 3], "s": [1, 2, 3], "random": 1, "number": [1, 2, 3], "gener": [1, 2, 3], "some": [1, 2, 3], "us": [1, 2, 3], "multipl": [1, 2, 3], "pseudorandom": 1, "want": [1, 2, 3], "captur": 1, "all": [1, 2, 3], "order": [1, 2], "ensur": [1, 2], "aren": 1, "t": [1, 2, 3], "accident": 1, "correl": 1, "between": [1, 2, 3], "list": [1, 2, 3], "first": [1, 2], "valu": [1, 3], "should": 1, "main": [1, 2], "which": [1, 2, 3], "reproduc": 1, "equal": [1, 2], "provid": [1, 2, 3], "won": 1, "true": [1, 2, 3], "exampl": [1, 2, 3], "type": 1, "bigint": 1, "reset": [1, 2, 3], "an": [1, 2, 3], "initi": [1, 2], "state": [1, 2], "method": [1, 2], "also": [1, 2, 3], "integ": 1, "ha": [1, 2, 3], "yet": 1, "If": [1, 2, 3], "alreadi": [1, 3], "call": [1, 2], "rng": 1, "moreov": 1, "typic": [1, 2], "case": [1, 2], "right": [1, 2], "after": 1, "never": 1, "again": [1, 2, 3], "info": [1, 2, 3], "option": [1, 2, 3], "dictionari": 1, "contain": [1, 2, 3], "extra": 1, "inform": [1, 2], "onli": [1, 2], "return_info": 1, "execut": [1, 3], "reward": [1, 2, 3], "done": [1, 2], "render": 1, "mode": 1, "vari": 1, "per": [1, 2, 3], "And": 1, "third": 1, "parti": 1, "mai": [1, 2, 3], "By": 1, "convent": 1, "human": 1, "displai": 1, "termin": [1, 2, 3], "noth": 1, "usual": 1, "consumpt": 1, "rgb_arrai": 1, "numpi": [1, 3], "ndarrai": 1, "shape": [1, 2, 3], "x": 1, "y": 1, "3": 1, "repres": [1, 2], "rgb": 1, "pixel": [1, 3], "imag": [1, 2], "suitabl": [1, 2], "turn": [1, 2], "video": [1, 2, 3], "ansi": 1, "string": 1, "str": 1, "stringio": 1, "style": 1, "text": 1, "represent": 1, "includ": [1, 2, 3], "newlin": 1, "escap": 1, "sequenc": 1, "e": 1, "g": 1, "color": 1, "make": [1, 2, 3], "sure": [1, 2], "metadata": 1, "render_mod": 1, "kei": [1, 2], "It": [1, 2, 3], "recommend": 1, "super": [1, 3], "implement": [1, 2, 3], "paramet": [1, 2], "myenv": 1, "def": [1, 2, 3], "self": [1, 2, 3], "np": [1, 3], "arrai": 1, "frame": 1, "elif": 1, "pop": 1, "up": [1, 2, 3], "window": 1, "els": 1, "just": [1, 2, 3], "rais": 1, "except": 1, "close": [1, 3], "overrid": 1, "subclass": [1, 2], "perform": [1, 2], "ani": [1, 2, 3], "necessari": 1, "cleanup": [1, 2], "automat": 1, "themselv": 1, "when": 1, "garbag": 1, "collect": [1, 3], "program": [1, 2], "exit": 1, "unpack_batched_ob": [1, 3], "batched_ob": 1, "pettingzoopufferenv": [1, 2, 3], "postprocessor": 1, "postprocessor_kwarg": 1, "team": [1, 2, 3], "agent": [1, 2, 3], "single_observation_spac": [1, 3], "single_action_spac": [1, 3], "expos": 1, "make_env": [1, 3], "one": [1, 2, 3], "you": [1, 2, 3], "most": [1, 2, 3], "time": [1, 2, 3], "other": [1, 2, 3], "interfac": [1, 2, 3], "them": [1, 2, 3], "so": [1, 2, 3], "static": 1, "refer": 1, "addition": [1, 2, 3], "baselin": [1, 2, 3], "have": [1, 2, 3], "custom": [1, 2, 3], "default": [1, 2, 3], "simpli": [1, 2], "befor": [1, 2, 3], "appli": [1, 2], "linear": [1, 2, 3], "layer": [1, 2, 3], "atari": [1, 2, 3], "procgen": [1, 3], "neural": [1, 2, 3], "mmo": [1, 2, 3], "nethack": [1, 3], "minihack": [1, 3], "pokemon": [1, 3], "red": [1, 3], "reason": [1, 3], "squar": [1, 2], "below": [1, 2, 3], "everyth": [1, 3], "through": [1, 3], "__init__": [1, 2, 3], "distance_to_target": 1, "num_target": 1, "1": [1, 2, 3], "torch": [1, 3], "alia": 1, "These": [1, 2, 3], "ar": [1, 2, 3], "requir": [1, 2, 3], "arg": 1, "kwarg": 1, "pure": 1, "pytorch": [1, 2, 3], "base": [1, 2, 3], "spec": [1, 2], "allow": [1, 2, 3], "repackag": 1, "rl": [1, 2, 3], "framework": [1, 2, 3], "encode_observ": 1, "decode_act": 1, "equival": [1, 2], "forward": [1, 2, 3], "structur": [1, 2, 3], "flexibl": [1, 3], "lstm": 1, "encod": [1, 3], "decod": [1, 3], "To": [1, 2, 3], "port": [1, 3], "put": 1, "from": [1, 2, 3], "recurr": [1, 2, 3], "cell": 1, "head": 1, "delet": 1, "its": 1, "specif": [1, 2, 3], "wrapper": [1, 2, 3], "sinc": [1, 2, 3], "each": [1, 2, 3], "treat": 1, "tempor": 1, "data": [1, 2], "bit": [1, 2, 3], "differ": [1, 2, 3], "approach": [1, 2, 3], "let": [1, 2], "write": [1, 3], "network": [1, 3], "specifi": [1, 2, 3], "critic": 1, "batch": [1, 2], "element": 1, "output": [1, 2], "abstract": 1, "flat_observ": 1, "hidden": [1, 3], "start": [1, 2, 3], "unflatten": [1, 2], "origin": [1, 2, 3], "form": 1, "structured_observation_spac": 1, "env_output": [1, 3], "obs_siz": 1, "hidden_s": 1, "lookup": 1, "embed": 1, "flat_hidden": 1, "multidiscret": [1, 2], "action_s": 1, "concaten": 1, "logit": 1, "dimens": 1, "sum": [1, 2], "nvec": [1, 3], "recurrentwrapp": 1, "meant": 1, "debug": [1, 2, 3], "run": [1, 2, 3], "unlik": 1, "learn": [1, 2, 3], "anyth": [1, 3], "relu": 1, "concat": 1, "convolut": 1, "stack": [1, 2], "three": 1, "follow": [1, 3], "framestack": 1, "mandatori": 1, "keyword": 1, "argument": [1, 2, 3], "suggest": [1, 2, 3], "without": [1, 2, 3], "distribut": [1, 3], "backend": [1, 2, 3], "serial": [1, 2, 3], "callabl": 1, "dict": 1, "num_env": [1, 3], "int": 1, "envs_per_work": [1, 2, 3], "envs_per_batch": 1, "env_pool": [1, 3], "bool": 1, "fals": [1, 2, 3], "process": [1, 2], "modul": [1, 2, 3], "flat_observation_spac": [1, 3], "ob": [1, 2, 3], "send": [1, 3], "recv": [1, 2, 3], "async_reset": [1, 2, 3], "profil": [1, 2], "get": [1, 2, 3], "multiprocess": [1, 2, 3], "parallel": [1, 2], "applic": 1, "rai": [1, 2, 3], "cluster": 1, "faster": [1, 2, 3], "than": [1, 2, 3], "machin": [1, 2, 3], "non": [1, 3], "get_valu": 1, "get_action_and_valu": [1, 3], "add": [1, 2, 3], "pretti": [1, 2], "simpl": [1, 2, 3], "see": [1, 2, 3], "recurrentpolici": 1, "shelv": 1, "until": 1, "stabl": [1, 2], "come": [1, 3], "soon": [1, 3], "browser": [2, 3], "doe": [2, 3], "what": [2, 3], "reinforc": [2, 3], "cpu": 2, "wouldn": 2, "pack": 2, "box": [2, 3], "wai": [2, 3], "With": 2, "releas": [2, 3], "python": [2, 3], "solv": 2, "tl": 2, "dr": 2, "20": 2, "improv": [2, 3], "across": 2, "workload": 2, "2x": 2, "complex": [2, 3], "nativ": 2, "multiag": [2, 3], "enhanc": 2, "pip": [2, 3], "instal": [2, 3], "u": 2, "But": 2, "d": 2, "like": [2, 3], "behind": 2, "curtain": 2, "read": 2, "do": [2, 3], "research": [2, 3], "sai": 2, "1000": 2, "second": 2, "core": 2, "5000": 2, "6": 2, "now": 2, "decid": 2, "work": [2, 3], "interest": 2, "happen": 2, "upon": 2, "brilliant": 2, "project": [2, 3], "must": 2, "been": 2, "develop": [2, 3], "truli": 2, "fantast": 2, "1500": 2, "scale": [2, 3], "1800": 2, "give": [2, 3], "speed": 2, "even": 2, "thei": [2, 3], "did": 2, "mani": [2, 3], "modern": 2, "overhead": 2, "mostli": 2, "launch": 2, "synchron": [2, 3], "roughli": 2, "ms": 2, "At": 2, "100": 2, "ignor": 2, "10": 2, "000": 2, "factor": 2, "varianc": 2, "defin": 2, "ratio": 2, "mu": 2, "std": 2, "root": 2, "For": [2, 3], "24": 2, "300": 2, "especi": 2, "intel": 2, "desktop": [2, 3], "seri": 2, "processor": 2, "featur": [2, 3], "slower": 2, "latenc": 2, "taken": 2, "transfer": 2, "gpu": [2, 3], "part": 2, "multiprocesss": 2, "naiv": 2, "leav": 2, "idl": 2, "dure": 2, "infer": 2, "As": 2, "rule": 2, "thumb": 2, "becaus": 2, "code": [2, 3], "alwai": 2, "thing": 2, "ones": [2, 3], "variabl": [2, 3], "depend": [2, 3], "On": 2, "hand": 2, "effect": 2, "100x": 2, "fewer": 2, "lower": 2, "2000": 2, "sp": 2, "reduc": 2, "impact": 2, "over": [2, 3], "sampl": [2, 3], "In": 2, "enabl": [2, 3], "interact": 2, "optim": [2, 3], "buffer": 2, "while": [2, 3], "techniqu": 2, "wa": [2, 3], "introduc": 2, "http": 2, "github": [2, 3], "com": 2, "alex": 2, "petrenko": 2, "factori": 2, "interleav": 2, "two": [2, 3], "good": [2, 3], "trick": 2, "supersed": 2, "final": 2, "simpler": 2, "pool": [2, 3], "finish": 2, "might": 2, "64": 2, "comput": [2, 3], "go": 2, "still": [2, 3], "select": 2, "fastest": 2, "40": 2, "sail": 2, "sg": 2, "massiv": [2, 3], "c": 2, "arbitrari": 2, "evalu": 2, "i": 2, "am": 2, "13900k": 2, "max": 2, "maingear": 2, "minim": 2, "debian": 2, "12": 2, "test": [2, 3], "9": 2, "1e": 2, "mean": 2, "delai": 2, "spawn": 2, "96": 2, "192": 2, "gymnasium": [2, 3], "separ": 2, "post": 2, "sweep": 2, "consid": 2, "yield": 2, "anoth": 2, "rel": [2, 3], "group": 2, "bar": 2, "gymasium": 2, "match": 2, "best": 2, "high": [2, 3], "standard": [2, 3], "deviat": 2, "import": [2, 3], "instanc": 2, "copi": 2, "heavi": 2, "save": 2, "dip": 2, "50": 2, "absolut": 2, "unavoid": 2, "reimplement": 2, "entir": 2, "architectur": [2, 3], "possibl": 2, "10734": 2, "36": 2, "delay_mean": 2, "delay_std": 2, "num_work": 2, "batch_siz": 2, "sync": 2, "11640": 2, "42": 2, "32715": 2, "65": 2, "27635": 2, "31": 2, "22681": 2, "48": 2, "26183": 2, "73": 2, "30120": 2, "75": 2, "out": [2, 3], "cap": 2, "worker": [2, 3], "There": [2, 3], "room": 2, "small": 2, "matter": 2, "much": [2, 3], "extrem": [2, 3], "concis": 2, "800": 2, "line": [2, 3], "ad": 2, "chang": [2, 3], "lot": 2, "don": [2, 3], "queue": 2, "fast": 2, "poll": 2, "instead": 2, "pipe": 2, "selector": [2, 3], "seen": 2, "notic": 2, "my": [2, 3], "sleep": 2, "trigger": 2, "context": 2, "switch": 2, "spent": 2, "process_tim": 2, "slice": 2, "count": 2, "150m": 2, "fix": 2, "amount": 2, "easi": 2, "thank": 2, "wait": 2, "unfortun": 2, "too": 2, "slow": [2, 3], "wish": 2, "itself": 2, "issu": 2, "caus": 2, "crash": 2, "why": 2, "result": 2, "mention": 2, "peopl": 2, "look": [2, 3], "experiment": 2, "procedur": [2, 3], "stuff": 2, "paradigm": 2, "actual": 2, "plai": [2, 3], "nice": [2, 3], "pain": 2, "free": [2, 3], "click": [2, 3], "colab": [2, 3], "new": [2, 3], "One": 2, "preload": 2, "common": 2, "importantli": 2, "rewritten": 2, "simplic": 2, "extens": 2, "flashi": 2, "significantli": 2, "rough": 2, "edg": 2, "longer": 2, "convert": [2, 3], "intern": 2, "wysiwyg": 2, "previous": [2, 3], "creation": 2, "back": [2, 3], "benefit": 2, "describ": 2, "blog": 2, "nle": [2, 3], "nmmo": [2, 3], "nmmo_creat": [2, 3], "nethack_cr": [2, 3], "gympufferenv": 2, "expect": 2, "abov": [2, 3], "prefer": 2, "directli": 2, "compar": 2, "vec": [2, 3], "Or": [2, 3], "async": [2, 3], "_": 2, "notori": [2, 3], "hard": 2, "sever": [2, 3], "popular": [2, 3], "onto": 2, "build": [2, 3], "coffe": 2, "break": [2, 3], "vanilla": [2, 3], "cleanrl_polici": [2, 3], "expens": 2, "runtim": 2, "could": 2, "disabl": 2, "o": 2, "inconveni": 2, "easili": 2, "forgotten": 2, "onc": 2, "startup": 2, "neglig": 2, "thu": 2, "far": [2, 3], "bug": 2, "version": [2, 3], "would": [2, 3], "caught": 2, "previou": 2, "sane": 2, "setup": [2, 3], "home": 2, "page": [2, 3], "updat": [2, 3], "written": 2, "bottleneck": 2, "studi": 2, "deriv": 2, "correctli": 2, "train": [2, 3], "pad": 2, "longstand": 2, "challeng": 2, "join": [2, 3], "discord": [2, 3], "tell": 2, "point": 2, "goal": 2, "game": [2, 3], "preliminari": 2, "re": [2, 3], "excit": 2, "announc": 2, "dozen": 2, "better": 2, "streamlin": [2, 3], "understand": 2, "need": [2, 3], "determinist": 2, "fulli": [2, 3], "short": [2, 3], "horizon": [2, 3], "contrast": 2, "nondeterminist": 2, "partial": 2, "larg": [2, 3], "popul": [2, 3], "hierarch": 2, "design": [2, 3], "mind": 2, "bia": 2, "toward": 2, "million": 2, "tackl": 2, "lead": 2, "focu": 2, "exclus": 2, "ran": 2, "file": [2, 3], "proxim": 2, "ppo": [2, 3], "replac": 2, "eas": 2, "log": 2, "latest": 2, "doubl": 2, "asynchron": [2, 3], "samplefactori": 2, "paper": 2, "accuraci": 2, "maintain": [2, 3], "wandb": 2, "correct": 2, "idea": 2, "appear": 2, "therebi": 2, "perspect": 2, "here": [2, 3], "size": 2, "conform": 2, "lose": 2, "both": 2, "constant": [2, 3], "sort": 2, "subtleti": 2, "signal": 2, "creat": 2, "straightforward": 2, "name": 2, "env_cl": 2, "env_nam": 2, "accept": 2, "certain": 2, "hook": [2, 3], "well": [2, 3], "abil": 2, "suppress": 2, "avoid": 2, "excess": 2, "split": [2, 3], "few": 2, "prone": 2, "difficult": [2, 3], "finicki": [2, 3], "costli": 2, "vecenv": 2, "rayvecenv": 2, "num_cor": 2, "adher": [2, 3], "perfectli": 2, "quickli": 2, "becom": 2, "cumbersom": 2, "outsid": 2, "ideal": 2, "beyond": [2, 3], "eat": 2, "trace": 2, "direct": 2, "access": 2, "remot": 2, "individu": 2, "shorter": 2, "map": 2, "convei": 2, "task": 2, "receiv": [2, 3], "field": 2, "cover": 2, "subsequ": 2, "major": 2, "downsid": 2, "particularli": 2, "hundr": 2, "thousand": 2, "price": 2, "paid": 2, "larger": 2, "emploi": 2, "help": 2, "mitig": 2, "ultim": 2, "continu": [2, 3], "tool": [2, 3], "plan": 2, "futur": [2, 3], "passthrough": 2, "area": 2, "algorithm": [2, 3], "aim": 2, "commonli": 2, "histor": 2, "multiplay": 2, "skill": 2, "rate": 2, "curriculum": 2, "focus": 2, "mechan": 2, "earli": 2, "stage": 2, "howev": 2, "rapid": 2, "progress": 2, "conflict": 2, "old": 2, "slowli": 2, "increas": 2, "coverag": 2, "joseph": [2, 3], "suarez": [2, 3], "ryan": 2, "sullivan": 2, "feedback": 2, "togeth": 3, "commun": 3, "discuss": 3, "twitter": 3, "star": 3, "repo": 3, "feed": 3, "whitepap": 3, "neurip": 3, "2023": 3, "alo": 3, "workshop": 3, "registri": 3, "tricki": 3, "clone": 3, "repositori": 3, "open": 3, "vscode": 3, "dev": 3, "plugin": 3, "docker": 3, "detect": 3, "devcontain": 3, "avail": 3, "packag": 3, "contributor": 3, "david": 3, "bloomin": 3, "store": 3, "nick": 3, "jenkin": 3, "layout": 3, "system": 3, "diagram": 3, "adversari": 3, "andranik": 3, "tigranyan": 3, "anim": 3, "pufferfish": 3, "hire": 3, "him": 3, "upwork": 3, "sara": 3, "earl": 3, "her": 3, "guid": 3, "notebook": 3, "button": 3, "top": 3, "heirarch": 3, "quirk": 3, "incompat": 3, "everi": 3, "flat": 3, "how": 3, "pettingzootruncatedwrapp": 3, "compliant": 3, "loss": 3, "underli": 3, "otherwis": 3, "choos": 3, "varieti": 3, "share": 3, "total": 3, "tweak": 3, "ship": 3, "care": 3, "nn": 3, "prod": 3, "128": 3, "modulelist": 3, "n": 3, "value_head": 3, "reshap": 3, "dec": 3, "driver_env": 3, "truncat": 3, "env_id": 3, "mask": 3, "reli": 3, "conveni": 3, "complet": 3, "almost": 3, "ll": 3, "unpack": 3, "flat_observation_structur": 3, "That": 3, "full": 3, "length": 3, "script": 3, "subject": 3, "ve": 3, "easier": 3, "portion": 3, "suit": 3, "80": 3, "academ": 3, "about": 3, "view": 3, "heavili": 3, "manag": 3, "competit": 3, "try": 3, "purpos": 3, "industri": 3, "veri": 3, "buggi": 3, "situat": 3, "openai": 3, "built": 3, "box2d": 3, "gameboi": 3, "activ": 3, "butterfli": 3, "arcad": 3, "classic": 3, "benchmark": 3, "minigrid": 3, "2d": 3, "grid": 3, "world": 3, "engin": 3, "builtin": 3, "target": 3, "computation": 3, "effici": 3, "magent": 3, "platform": 3, "combin": 3, "me": 3, "level": 3, "strip": 3, "down": 3, "edit": 3, "crafter": 3, "minecraft": 3, "long": 3, "griddli": 3, "micrort": 3, "real": 3, "strategi": 3, "java": 3, "configur": 3, "No": 3, "wip": 3, "heterogen": 3, "softwar": 3, "under": 3, "mit": 3, "pufferai": 3, "branch": 3, "privat": 3}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"index": 0, "user": 0, "guid": 0, "api": 0, "blog": 0, "emul": [1, 2], "environ": [1, 3], "model": 1, "vector": [1, 2], "cleanrl": [1, 2], "integr": 1, "rllib": 1, "bind": 1, "sb3": 1, "pufferlib": 2, "0": 2, "5": 2, "A": 2, "bigger": 2, "envpool": 2, "grow": 2, "puffer": 2, "The": 2, "simul": 2, "crisi": 2, "solut": 2, "experi": 2, "technic": 2, "detail": 2, "gotcha": 2, "4": 2, "readi": 2, "take": 2, "fish": 2, "puffertank": 2, "polici": 2, "error": 2, "handl": 2, "miscellan": 2, "2": 2, "big": 2, "problem": 2, "statement": 2, "demo": 2, "next": 2, "step": 2, "librari": 3, "current": 3, "limit": 3, "licens": 3}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 6, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 56}})
\ No newline at end of file
diff --git a/docs/source/_static/0-5_blog_envpool.png b/docs/source/_static/0-5_blog_envpool.png
new file mode 100644
index 0000000..46a472e
Binary files /dev/null and b/docs/source/_static/0-5_blog_envpool.png differ
diff --git a/docs/source/_static/0-5_blog_header.png b/docs/source/_static/0-5_blog_header.png
new file mode 100644
index 0000000..1c49024
Binary files /dev/null and b/docs/source/_static/0-5_blog_header.png differ
diff --git a/docs/source/rst/api.rst b/docs/source/rst/api.rst
index 505e364..58dc7e0 100644
--- a/docs/source/rst/api.rst
+++ b/docs/source/rst/api.rst
@@ -9,7 +9,7 @@ Emulation
 
 Wrap your environments for broad compatibility. Supports passing creator functions, classes, or env objects. The API of the returned PufferEnv is the same as Gym/PettingZoo.
 
-.. autoclass:: pufferlib.emulation.GymPufferEnv
+.. autoclass:: pufferlib.emulation.GymnasiumPufferEnv
    :members:
    :undoc-members:
    :noindex:
@@ -19,93 +19,21 @@ Wrap your environments for broad compatibility. Supports passing creator functio
    :undoc-members:
    :noindex:
 
-Registry
-########
+Environments
+############
 
-make_env functions and policies for included environments.
+All included environments expose make_env and env_creator functions. make_env is the one that you want most of the time. The other one is used to expose e.g. class interfaces for environments that support them so that you can pass around static references.
 
-Atari
-*****
-
-.. automodule:: pufferlib.registry.atari
-   :members:
-   :undoc-members:
-   :noindex:
+Additionally, all environments expose a Policy class with a baseline model. Note that not all environments have *custom* policies, and the default simply flattens observations before applying a linear layer. Atari, Procgen, Neural MMO, Nethack/Minihack, and Pokemon Red currently have reasonable policies.
 
+The PufferLib Squared environment is used as an example below. Everything is exposed through __init__, so you can call these methods through e.g. pufferlib.environments.squared.make_env
 
-Butterfly
-*********
-
-.. automodule:: pufferlib.registry.butterfly
+.. automodule:: pufferlib.environments.squared.environment
    :members:
    :undoc-members:
    :noindex:
 
-
-Classic Control
-***************
-
-.. automodule:: pufferlib.registry.classic_control
-   :members:
-   :undoc-members:
-   :noindex:
-
-Crafter
-*******
-
-.. automodule:: pufferlib.registry.crafter
-   :members:
-   :undoc-members:
-   :noindex:
-
-Griddly
-*******
-
-.. automodule:: pufferlib.registry.griddly
-   :members:
-   :undoc-members:
-   :noindex:
-
-
-MAgent
-******
-
-.. automodule:: pufferlib.registry.magent
-   :members:
-   :undoc-members:
-   :noindex:
-
-
-MicroRTS
-********
-
-.. automodule:: pufferlib.registry.microrts
-   :members:
-   :undoc-members:
-   :noindex:
-
-
-NetHack
-*******
-
-.. automodule:: pufferlib.registry.nethack
-   :members:
-   :undoc-members:
-   :noindex:
-
-
-Neural MMO
-**********
-
-.. automodule:: pufferlib.registry.nmmo
-   :members:
-   :undoc-members:
-   :noindex:
-
-Procgen
-*******
-
-.. automodule:: pufferlib.registry.procgen
+.. autoclass:: pufferlib.environments.squared.torch.Policy
    :members:
    :undoc-members:
    :noindex:
@@ -113,7 +41,7 @@ Procgen
 Models
 ######
 
-PufferLib model API and default policies
+PufferLib model default policies and optional API. These are not required to use PufferLib.
 
 .. automodule:: pufferlib.models
    :members:
@@ -150,7 +78,7 @@ Wrap your PyTorch policies for use with CleanRL
    :undoc-members:
    :noindex:
 
-Recurrence requires you to subclass our base policy instead. See the default policies for examples.
+Wrap your PyTorch policies for use with CleanRL but add an LSTM. This requires you to use our policy API. It's pretty simple -- see the default policies for examples.
 
 .. autoclass:: pufferlib.frameworks.cleanrl.RecurrentPolicy
    :members:
@@ -160,9 +88,14 @@ Recurrence requires you to subclass our base policy instead. See the default pol
 RLlib Binding
 #############
 
-Wrap your policies for use with RLlib (WIP)
+Wrap your policies for use with RLlib (Shelved until RLlib is more stable)
 
 .. automodule:: pufferlib.frameworks.rllib
    :members:
    :undoc-members:
-   :noindex:
\ No newline at end of file
+   :noindex:
+
+SB3 Binding
+###########
+
+Coming soon!
diff --git a/docs/source/rst/blog.rst b/docs/source/rst/blog.rst
index 86b1d2a..ee6f910 100644
--- a/docs/source/rst/blog.rst
+++ b/docs/source/rst/blog.rst
@@ -11,6 +11,73 @@
      </video>
    </center>
 
+PufferLib 0.5: A Bigger EnvPool for Growing Puffers
+###################################################
+
+This is what reinforcement learning does to your CPU utilization:
+
+.. figure:: ../_static/0-5_blog_header.png
+
+You wouldn’t pack a box this way, right? With PufferLib 0.5, we are releasing a Python implementation of EnvPool to solve this problem. **TL;DR: ~20% performance improvement across most workloads, up to 2x for complex environments, and native multiagent support.**
+
+.. figure:: ../_static/0-5_blog_envpool.png
+
+If you just want the enhancements, you can pip install -U pufferlib. But if you’d like to see a bit behind the curtain, read on!
+
+The Simulation Crisis
+*********************
+
+You want to do some RL research, so you install Atari. Say it runs at 1000 steps/second on 1 core and 5000 steps/second on 6 cores. Now, you decide you want to work on a more interesting environment and happen upon Neural MMO, a brilliant project that must have been developed by a truly fantastic team. It runs at 1500 steps/second – faster than Atari! So you scale it up to 6 cores and it runs at … 1800 steps per second. What gives?
+
+The problem is that environments simulated on different cores do not run at the same speed. Even if they did, many modern CPUs have cores that run at different speeds. Parallelization overhead is mostly the sum of:
+-  Launching/synchronization overhead. This is roughly 0.1 ms per process and is linear in the number of processes. At ~100 steps per second, you can ignore it. At >10,000 steps/second, it is the main limiting factor.
+- Environment variance. This is defined by the ratio mu/std of the environment simulation time and scales with the square root of the number of processes. For 24 processes, 10% std is 20% overhead and 100% std is 300% overhead.
+- Different core speeds. Many modern CPUs, especially Intel desktop series processors, feature additional cores that are ~20% slower than the main cores.
+- Model latency. This is the time taken to transfer observations to GPU, run the model, and transfer actions to CPU. It is not technically part of multiprocesssing overhead, but naive implementations will leave CPUs idle during model inference.
+
+As a rule of thumb, simple RL environments have < 10% variance because the code is always simulating roughly the same thing. Complex environments, especially ones with variable numbers of agents, can have > 100% variance because different code runs depending on the current state. On the other hand, if your environment has 100 agents, you are effectively running 100x fewer simulations for the same data, so launching/synchronization overhead is lower.
+
+The Solution
+************
+
+Run multiple environments per process if you have > ~2000 sps environment with variance < ~10%. This will reduce the impact of launching/synchronization overhead and also reduces variance because you are summing over samples. In PufferLib, we typically enable this only for environments > ~5000 sps because of interactions with the optimizations below.
+
+Simulate multiple buffers of environments so that one buffer is running while your model is processing observations from the other. This technique was introduced by https://github.com/alex-petrenko/sample-factory and does not speed up simulation, but it allows you to interleave simulations from two sets of environments. It’s a good trick, but it is superseded by the final optimization, which is faster and simpler.
+
+Run a pool of environments and sample from the first ones to finish stepping. For example, if you want a batch of 24 observations, you might run 64 environments. At each step, the 24 for which you have computed actions are going to take a while to simulate, but you can still select the fastest 24 from the other 64-24=40 environments. This technique was introduced by https://github.com/sail-sg/envpool and is massively effective, but the original implementation is only for specific C/C++ environments. PufferLib’s implementation is in Python, so it is slower, but it works for arbitrary Python environments and includes native multiagent support.
+
+Experiments
+***********
+
+To evaluate the performance of different backends, I am using a 13900k (24 cores) on a max specced Maingear desktop running a minimal Debian 12 install. We test 9 different simulated environments: 1e-2 to 1-4 mean delay with 0-100% delay std. For each environment, we spawn 1, 6, 24, 96, and 192 processes for each backend tested (Gymnasium’s and Pufferlib’s serial and multiprocessing implementations + Pufferlib’s pool). We also have Ray implementations compatible with our pooling code, but that will be a separate post. Additionally, PufferLib implementations sweep over (1, 2, 4) environments per process and PufferLib pool will compute 24 observations at a time. We do not consider model latency, which can yield another 2x relative performance for pooling on specific workloads.
+
+.. figure:: ../_static/0-5_blog_envpool.png
+
+9 groups of bars, each for one environment. 5 groups of bars per environment, each for a specific number of processes. The serial Gymasium/PufferLib experiments match in all cases. The best PufferLib settings are 10-20% faster than the best Gymasium settings for all workloads and can be up to 2x faster for environments with a high standard deviation in important cases (for instance, you may not want to run 192 copies of heavy environments). Again, this is before even considering the time saved by interleaving with the model forward pass.
+
+All of the implementations start to dip ~10% at 1,000 steps/second and ~50% at 10,000 steps/second. To make absolutely sure that this overhead is unavoidable, I reimplemented the entire pool architecture as minimally as possible, without any of the environment wrapper or data transfer overhead:
+
+SPS: 10734.36 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 1 batch_size: 1 sync: False
+SPS: 11640.42 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 1 batch_size: 1 sync: True
+SPS: 32715.65 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 6 batch_size: 6 sync: False
+SPS: 27635.31 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 6 batch_size: 6 sync: True
+SPS: 22681.48 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 24 batch_size: 6 sync: False
+SPS: 26183.73 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 24 batch_size: 24 sync: False
+SPS: 30120.75 envs_per_worker: 1 delay_mean: 0 delay_std: 0 num_workers: 24 batch_size: 6 sync: True
+
+As it turns out, Python’s multiprocessing caps around 10,000 steps per second per worker. There is still room for improvement by running more environments per process, but at this speed, small optimizations to the data processing code start to matter much more.
+
+Technical Details and Gotchas
+****************************
+
+PufferLib’s vectorization library is extremely concise – around 800 lines for serial, multiprocessing, and ray backends with support for PufferLib’s Gymnasium and PettingZoo wrappers. Adding envpool only required changing around 100 lines of code but required a lot of performance testing:
+Don’t use multiprocessing.Queue. There’s no fast way to poll which processes are done. Instead, use multiprocessing.Pipe and poll with selectors. I have not seen noticeable overhead from this in any of my tests.
+Don’t use time.sleep(), as this will trigger context switching, or time.time(), as this will include time spent on other processes. Use time.process_time() if you want an equal slice per core or count to ~150M/second (time it on your machine) if you want a fixed amount of work.
+
+The ray backend was extremely easy to implement thanks to ray.wait(). It is unfortunately too slow for most environments, but I wish standard multiprocessing used the Ray API, if not the architecture. The library itself has some cleanup issues that can cause crashes during heavy performance tests, which is why results are not included in this post.
+
+There’s one other thing I want to mention for people looking at the code. I was doing some experimental procedural stuff testing different programming paradigms, so the actual class interfaces are in __init__. It’s pretty much equivalent to one subclass per backend. 
+
 PufferLib 0.4: Ready to Take on Bigger Fish
 ###########################################
 
diff --git a/docs/source/rst/landing.rst b/docs/source/rst/landing.rst
index 4b4661f..d4d72ab 100644
--- a/docs/source/rst/landing.rst
+++ b/docs/source/rst/landing.rst
@@ -44,7 +44,7 @@ You have an environment, a PyTorch model, and a reinforcement learning library t
 
 |
 
-Join our community Discord for support and Discussion, follow my Twitter for news, and star the repo to feed the puffer. :download:`Whitepaper <../_static/neurips_2023_aloe.pdf>` appearing at NeurIPS 2023 ALOE Workshop. Come say hi!
+Join our community Discord for support and Discussion, follow my Twitter for news, and star the repo to feed the puffer. We also have a :download:`Whitepaper <../_static/neurips_2023_aloe.pdf>` featured at the NeurIPS 2023 ALOE workshop.
 
 .. dropdown:: Installation
 
@@ -54,7 +54,7 @@ Join our community Discord for support and Discussion, follow my Twitter for new
 
       `PufferTank <https://github.com/pufferai/puffertank>`_ is a GPU container with PufferLib and dependencies for all environments in the registry, including some that are slow and tricky to install.
 
-      If you are new to containers, clone the repository and open it in VSCode. You will need to install the Dev Container plugin as well as Docker Desktop. VSCode will then detect the settings in .devcontainer and set up the container for you.
+      If you have not used containers before and just want everything to work, clone the repository and open it in VSCode. You will need to install the Dev Container plugin as well as Docker Desktop. VSCode will then detect the settings in .devcontainer and set up the container for you.
 
     .. tab-item:: Pip
 
@@ -76,7 +76,7 @@ Join our community Discord for support and Discussion, follow my Twitter for new
 
    **Joseph Suarez**: Creator and developer of PufferLib
 
-   **David Bloomin**: Policy pool/store/selector
+   **David Bloomin**: 0.4 policy pool/store/selector
 
    **Nick Jenkins**: Layout for the system architecture diagram. Adversary.design.
 
@@ -86,40 +86,45 @@ Join our community Discord for support and Discussion, follow my Twitter for new
 
 **You can open this guide in a Colab notebook by clicking the demo button at the top of this page**
 
-Complex environments may have heirarchical observations and actions, variable numbers of agents, and other quirks that make them difficult to work with and incompatible with standard reinforcement learning libraries. PufferLib's emulation layer makes every environment look like it has flat observations and actions and a constant number of agents, with no changes to the underlying environment. Here's how it works with two notoriously complex environments, NetHack and Neural MMO.
+Complex environments may have heirarchical observations and actions, variable numbers of agents, and other quirks that make them difficult to work with and incompatible with standard reinforcement learning libraries. PufferLib's emulation layer makes every environment look like it has flat observations/actions and a constant number of agents. Here's how it works with NetHack and Neural MMO, two notoriously complex environments.
 
 .. code-block:: python
 
   import pufferlib.emulation
+  import pufferlib.wrappers
 
   import nle, nmmo
 
   def nmmo_creator():
-      return pufferlib.emulation.PettingZooPufferEnv(env_creator=nmmo.Env)
+      env = nmmo.Env()
+      env = pufferlib.wrappers.PettingZooTruncatedWrapper(env)
+      return pufferlib.emulation.PettingZooPufferEnv(env=env)
 
   def nethack_creator():
-      return pufferlib.emulation.GymPufferEnv(env_creator=nle.env.NLE)
+      return pufferlib.emulation.GymnasiumPufferEnv(env_creator=nle.env.NLE)
 
-You can pass envs by class, creator function, or object, with or without additional arguments. These wrappers enable us to make some optimizations to vectorization code that would be difficult to implement otherwise. You can choose from a variety of vectorization backends. They all share the same interface with synchronous and asynchronous options.
+The wrappers give you back a Gymnasium/PettingZoo compliant environment. There is no loss of generality and no change to the underlying environment. You can wrap environments by class, creator function, or object, with or without additional arguments. These wrappers enable us to make some optimizations to vectorization code that would be difficult to implement otherwise. You can choose from a variety of vectorization backends. They all share the same interface with synchronous and asynchronous options.
 
 .. code-block:: python
 
   import pufferlib.vectorization
 
-  # vec = pufferlib.vectorization.Serial
-  vec = pufferlib.vectorization.Multiprocessing
+  vec = pufferlib.vectorization.Serial
+  # vec = pufferlib.vectorization.Multiprocessing
   # vec = pufferlib.vectorization.Ray
 
-  envs = vec(nmmo_creator, num_workers=2, envs_per_worker=2)
+  # Vectorization API. Specify total number of environments and number per worker
+  # Setting env_pool=True can be much faster but requires some tweaks to learning code
+  envs = vec(nmmo_creator, num_envs=4, envs_per_worker=2, env_pool=False)
 
-  sync = True
-  if sync:
-      obs = envs.reset()
-  else:
-      envs.async_reset()
-      obs, _, _, _ = envs.recv()
+  # Synchronous API - reset/step
+  # obs = envs.reset()[0]
 
-We suggest Serial for debugging and Multiprocessing for most training runs. Ray is a good option if you need to scale beyond a single machine.
+  # Asynchronous API - async_reset, send/recv
+  envs.async_reset()
+  obs = envs.recv()[0]
+
+Our backends support asynchronous on-policy sampling through a Python implementation of EnvPool. This makes them *faster* than the implementations that ship with most RL libraries. We suggest Serial for debugging and Multiprocessing for most training runs. Ray is a good option if you need to scale beyond a single machine.
 
 PufferLib allows you to write vanilla PyTorch policies and use them with multiple learning libraries. We take care of the details of converting between the different APIs. Here's a policy that will work with *any* environment, with a one-line wrapper for CleanRL.
 
@@ -132,7 +137,7 @@ PufferLib allows you to write vanilla PyTorch policies and use them with multipl
   import pufferlib.frameworks.cleanrl
 
   class Policy(nn.Module):
-      def __init__(self, envs):
+      def __init__(self, env):
           super().__init__()
           self.encoder = nn.Linear(np.prod(
               envs.single_observation_space.shape), 128)
@@ -151,12 +156,10 @@ PufferLib allows you to write vanilla PyTorch policies and use them with multipl
   policy = Policy(envs.driver_env)
   cleanrl_policy = pufferlib.frameworks.cleanrl.Policy(policy)
   actions = cleanrl_policy.get_action_and_value(obs)[0].numpy()
-  obs, rewards, dones, infos = envs.step(actions)
+  obs, rewards, terminals, truncateds, infos, env_id, mask = envs.step(actions)
   envs.close()
 
-There's also a lightweight, fully optional base policy class for PufferLib. It breaks the forward pass into two functions, encode_observations and decode_actions. The advantage of this is that it lets us handle recurrance for you, since every framework does this a bit differently.
-
-So far, the code above is fully general and does not rely on PufferLib support for specific environments. For convenience, we also provide a registry of environments and models. Here's a complete example.
+There's also an optional policy base class for PufferLib. It just breaks the forward pass into an encode and decode step, which allows us to handle recurrance for you. So far, the code above is fully general and does not rely on PufferLib support for specific environments. For convenience, we also provide environment hooks with standard wrappers and baseline models. Here's a complete example.
 
 .. code-block:: python
 
@@ -165,33 +168,32 @@ So far, the code above is fully general and does not rely on PufferLib support f
   import pufferlib.models
   import pufferlib.vectorization
   import pufferlib.frameworks.cleanrl
-  import pufferlib.registry.nmmo
+  import pufferlib.environments.nmmo
 
   envs = pufferlib.vectorization.Multiprocessing(
-      env_creator=pufferlib.registry.nmmo.make_env,
-      num_workers=2, envs_per_worker=2)
+      env_creator=pufferlib.environments.nmmo.make_env,
+      num_envs=4, envs_per_worker=2)
 
-  policy = pufferlib.registry.nmmo.Policy(envs.driver_env)
-  policy = pufferlib.models.RecurrentWrapper(envs, policy,
-      input_size=256, hidden_size=256)
-  cleanrl_policy = pufferlib.frameworks.cleanrl.RecurrentPolicy(policy)
+  policy = pufferlib.environments.nmmo.Policy(envs.driver_env)
+  cleanrl_policy = pufferlib.frameworks.cleanrl.Policy(policy)
 
-  obs = envs.reset()
-  obs = torch.Tensor(obs)
-  state = [torch.zeros((1, 256, 256)), torch.zeros((1, 256, 256))]
-  actions = cleanrl_policy.get_action_and_value(obs, state)[0].numpy()
-  obs, rewards, dones, infos = envs.step(actions)
+  env_outputs = envs.reset()[0]
+  obs = torch.Tensor(env_outputs)
+  actions = cleanrl_policy.get_action_and_value(obs)[0].numpy()
+  obs, rewards, terminals, truncateds, infos, env_id, mask = envs.step(actions)
   envs.close()
 
-It's that simple -- almost. If you have an environment with structured observations, you'll hvae to unpack them in the network forward pass since PufferLif will flatten them in emulation. We provide a utility for this -- just be sure to save a reference to your environment inside of the model so you have access to the observation space.
+It's that simple -- almost. If you have an environment with structured observations, you'll have to unpack them in the network forward pass since PufferLib will flatten them in emulation. We provide a utility for this.
 
 .. code-block:: python
 
-  env_outputs = pufferlib.emulation.unpack_batched_obs(
-      env_outputs, self.envs.flat_observation_space
+  obs = pufferlib.emulation.unpack_batched_obs(
+      env_outputs,
+      envs.driver_env.flat_observation_space,
+      envs.driver_env.flat_observation_structure
   )
 
-That's all you need to get started. The PufferLib repository contains full-length CleanRL scripts with PufferLib integration.
+That's all you need to get started. The PufferLib repository contains full-length CleanRL scripts with PufferLib integration. SB3 and other integrations coming soon!
 
 Libraries
 #########
@@ -223,7 +225,7 @@ PufferLib provides *pufferlib.frameworks* for the the learning libraries below.
 
 Or view it on GitHub `here <https://github.com/PufferAI/PufferLib/blob/experimental/cleanrl_ppo_atari.py>`_
 
-We are also working on a heavily customized version of CleanRL PPO with support for recurrent and non-recurrent models, async environment execution, variable agent populations, self-play, and experiment management. This is the version we use for our research and the NeurIPS 2023 Neural MMO Competition. It's still under development, but you can try it out `here <https://github.com/PufferAI/PufferLib/blob/experimental/clean_pufferl.py>`_ 
+PufferLib also includes a heavily customized version of CleanRL PPO with support for recurrent and non-recurrent models, async environment execution, variable agent populations, self-play, and experiment management. This is the version we use for our research and the NeurIPS 2023 Neural MMO Competition. You can try it out `here <https://github.com/PufferAI/PufferLib/blob/experimental/clean_pufferl.py>`_ 
 
 .. raw:: html
 
@@ -238,12 +240,12 @@ We are also working on a heavily customized version of CleanRL PPO with support
         </div>
     </div>
 
-While RLlib is great on paper, there are currently a few issues. The pre-gymnasium 2.0 release is very buggy and has next to no error checking on the user API. The latest version may be more stable, but it pins a very recent version of Gymnasium that breaks compatiblity with many environments. We have a simple running script `here <https://github.com/PufferAI/PufferLib/blob/experimental/rllib_ppo.py>`_ that works with 2.0 for now. We will update this when the situation improves.
+We have previously supported RLLib and may again in the future. RLlib has not received updates in a while, and the current release is very buggy. We will update this if the situation improves.
 
 Environments
 ############
 
-We also provide a registry of environments and models that are supported out of the box. These environments are already set up for you in PufferTank and are used in our test cases to ensure they work with PufferLib. Several also include reasonable baseline policies. Join our Discord if you would like to add setup and tests for new environments or improvements to any of the baselines.
+We also provide integrations for many environments out of the box. Non-pip dependencies are already set up for you in PufferTank. Several environments also include reasonable baseline policies. Join our Discord if you would like to add setup and tests for new environments or improvements to any of the baselines.
 
 
 .. raw:: html
@@ -261,12 +263,12 @@ We also provide a registry of environments and models that are supported out of
 
     <div style="display: flex; align-items: center; margin-bottom: 15px;">
         <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
-            <a href="https://github.com/Farama-Foundation/Arcade-Learning-Environment" target="_blank">
-                <img src="https://img.shields.io/github/stars/Farama-Foundation/Arcade-Learning-Environment?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star Arcade Learning Environment" width="100px">
+            <a href="https://github.com/PWhiddy/PokemonRedExperiments" target="_blank">
+                <img src="https://img.shields.io/github/stars/PWhiddy/PokemonRedExperiments?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star Pokemon Red" width="100px">
             </a>
         </div>
         <div>
-            <p><a href="https://github.com/Farama-Foundation/Arcade-Learning-Environment">Arcade Learning Environment</a> provides a Gym interface for classic Atari games. This is the most popular benchmark for reinforcement learning algorithms.</p>
+            <p><a href="https://github.com/PWhiddy/PokemonRedExperiments">Pokemon Red</a> is one of the original Pokemon games for gameboy. This project uses the game as an environment for reinforcement learning. We are actively supporting development on this one!</p>
         </div>
     </div>
 
@@ -283,12 +285,23 @@ We also provide a registry of environments and models that are supported out of
 
     <div style="display: flex; align-items: center; margin-bottom: 15px;">
         <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
-            <a href="https://github.com/neuralmmo/environment" target="_blank">
-                <img src="https://img.shields.io/github/stars/openai/neural-mmo?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star Neural MMO" width="100px">
+            <a href="https://github.com/Farama-Foundation/Arcade-Learning-Environment" target="_blank">
+                <img src="https://img.shields.io/github/stars/Farama-Foundation/Arcade-Learning-Environment?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star Arcade Learning Environment" width="100px">
             </a>
         </div>
         <div>
-            <p><a href="https://neuralmmo.github.io">Neural MMO</a> is a massively multiagent environment for reinforcement learning. It combines large agent populations with high per-agent complexity and is the most actively maintained (by me) project on this list.</p>
+            <p><a href="https://github.com/Farama-Foundation/Arcade-Learning-Environment">Arcade Learning Environment</a> provides a Gym interface for classic Atari games. This is the most popular benchmark for reinforcement learning algorithms.</p>
+        </div>
+    </div>
+
+    <div style="display: flex; align-items: center; margin-bottom: 15px;">
+        <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
+            <a href="https://github.com/Farama-Foundation/Minigrid" target="_blank">
+                <img src="https://img.shields.io/github/stars/Farama-Foundation/Minigrid?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star Minigrid" width="100px">
+            </a>
+        </div>
+        <div>
+            <p><a href="https://github.com/Farama-Foundation/Minigrid">Minigrid</a> is a 2D grid-world environment engine and a collection of builtin environments. The target is flexible and computationally efficient RL research.</p>
         </div>
     </div>
 
@@ -303,6 +316,17 @@ We also provide a registry of environments and models that are supported out of
         </div>
     </div>
 
+    <div style="display: flex; align-items: center; margin-bottom: 15px;">
+        <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
+            <a href="https://github.com/neuralmmo/environment" target="_blank">
+                <img src="https://img.shields.io/github/stars/openai/neural-mmo?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star Neural MMO" width="100px">
+            </a>
+        </div>
+        <div>
+            <p><a href="https://neuralmmo.github.io">Neural MMO</a> is a massively multiagent environment for reinforcement learning. It combines large agent populations with high per-agent complexity and is the most actively maintained (by me) project on this list.</p>
+        </div>
+    </div>
+
     <div style="display: flex; align-items: center; margin-bottom: 15px;">
         <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
             <a href="https://github.com/openai/procgen" target="_blank">
@@ -325,6 +349,17 @@ We also provide a registry of environments and models that are supported out of
         </div>
     </div>
 
+    <div style="display: flex; align-items: center; margin-bottom: 15px;">
+        <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
+            <a href="https://github.com/facebookresearch/minihack" target="_blank">
+                <img src="https://img.shields.io/github/stars/facebookresearch/minihack?labelColor=999999&color=66dcdc&cacheSeconds=100000" alt="Star MiniHack" width="100px">
+            </a>
+        </div>
+        <div>
+            <p><a href="https://github.com/facebookresearch/nle">MiniHack Learning Environment</a> is a stripped down version of NetHack with support for level editing and custom procedural generation.</p>
+        </div>
+    </div>
+
     <div style="display: flex; align-items: center; margin-bottom: 15px;">
         <div style="flex-shrink: 0; width: 100px; margin-right: 20px;">
             <a href="https://github.com/danijar/crafter" target="_blank">
@@ -362,11 +397,9 @@ Current Limitations
 ###################
 
 - No continuous action spaces (WIP)
-- Pre-gymnasium Gym and PettingZoo only (WIP)
 - Support for heterogenous observations and actions requires you to specify teams such that each team has the same observation and action space. There's no good way around this.
 
 License
 #######
 
-PufferLib is free and open-source software under the MIT license. This is the full set of tools maintained by PufferAI; we do not have private repositories with additional utilities.
-
+PufferLib is free and open-source software under the MIT license. This is the full set of tools maintained by PufferAI. Dev branches are public and we do not have private repositories with additional utilities.