Skip to content

Commit

Permalink
1.0 docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Joseph Suarez committed Jun 8, 2024
1 parent 168aef4 commit 4dbbf04
Show file tree
Hide file tree
Showing 25 changed files with 439 additions and 507 deletions.
Binary file modified docs/build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/build/doctrees/index.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/rst/api.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/rst/blog.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/rst/landing.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/rst/ocean.doctree
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/build/html/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 488005cb9aaf0985b8c6b957ed0c8b53
config: 4475271277a1702ac5fd77c51ef201f4
tags: 645f666f9bcd5a90fca523b33c5a78b7
15 changes: 9 additions & 6 deletions docs/build/html/_sources/rst/api.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@ All included environments expose make_env and env_creator functions. make_env is

Additionally, all environments expose a Policy class with a baseline model. Note that not all environments have *custom* policies, and the default simply flattens observations before applying a linear layer. Atari, Procgen, Neural MMO, Nethack/Minihack, and Pokemon Red currently have reasonable policies.

The PufferLib Squared environment is used as an example below. Everything is exposed through __init__, so you can call these methods through e.g. pufferlib.environments.squared.make_env
The PufferLib Squared environment is used as an example below. Everything is exposed through __init__, so you can call these methods through e.g. pufferlib.environments.ocean.make_env

.. autoclass:: pufferlib.environments.ocean.squared.Squared
.. autoclass:: pufferlib.environments.ocean.ocean.Squared
:members:
:undoc-members:
:noindex:
Expand All @@ -41,7 +41,7 @@ The PufferLib Squared environment is used as an example below. Everything is exp
Models
######

PufferLib model default policies and optional API. These are not required to use PufferLib.
PufferLib model default policies. They are vanilla PyTorch policies with no custom PufferLib API. Optionally, you can split the forward pass into encode and decode functions. This allows you to use our convenience wrapper for LSTM support.

.. automodule:: pufferlib.models
:members:
Expand All @@ -53,17 +53,20 @@ Vectorization

Distributed backends for PufferLib-wrapped environments

.. autoclass:: pufferlib.vectorization.Serial
.. autofunction:: pufferlib.vector.make
:noindex:

.. autoclass:: pufferlib.vector.Serial
:members:
:undoc-members:
:noindex:

.. autoclass:: pufferlib.vectorization.Multiprocessing
.. autoclass:: pufferlib.vector.Multiprocessing
:members:
:undoc-members:
:noindex:

.. autoclass:: pufferlib.vectorization.Ray
.. autoclass:: pufferlib.vector.Ray
:members:
:undoc-members:
:noindex:
Expand Down
82 changes: 50 additions & 32 deletions docs/build/html/_sources/rst/landing.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
You have an environment, a PyTorch model, and a reinforcement learning library that are designed to work together but don't. PufferLib provides one-line wrappers that make them play nice.

.. card::
:link: https://colab.research.google.com/drive/142tl_9MiEDXX-E5-6kjwZsOmRYPcFrFU?usp=sharing
:link: https://colab.research.google.com/drive/1pK5QQG9-MfVdbUNr2vXr2l6zJBS-au1V?usp=sharing
:width: 75%
:margin: 4 2 auto auto
:text-align: center
Expand Down Expand Up @@ -76,6 +76,10 @@ Join our community Discord for support and Discussion, follow my Twitter for new

**Joseph Suarez**: Creator and developer of PufferLib

**thatguy**: Several performance improvements w/ torch compilation, major pokerl contributor.

**Kyoung Whan Choe (최경환)**: Testing and bug fixes

**David Bloomin**: 0.4 policy pool/store/selector

**Nick Jenkins**: Layout for the system architecture diagram. Adversary.design.
Expand All @@ -86,6 +90,9 @@ Join our community Discord for support and Discussion, follow my Twitter for new

**You can open this guide in a Colab notebook by clicking the demo button at the top of this page**

Emulation
#########

Complex environments may have heirarchical observations and actions, variable numbers of agents, and other quirks that make them difficult to work with and incompatible with standard reinforcement learning libraries. PufferLib's emulation layer makes every environment look like it has flat observations/actions and a constant number of agents. Here's how it works with NetHack and Neural MMO, two notoriously complex environments.

.. code-block:: python
Expand All @@ -107,22 +114,16 @@ The wrappers give you back a Gymnasium/PettingZoo compliant environment. There i

.. code-block:: python
import pufferlib.vectorization
vec = pufferlib.vectorization.Serial
# vec = pufferlib.vectorization.Multiprocessing
# vec = pufferlib.vectorization.Ray
# Vectorization API. Specify total number of environments and number per worker
# Setting env_pool=True can be much faster but requires some tweaks to learning code
envs = vec(nmmo_creator, num_envs=4, envs_per_worker=2, env_pool=False)
import pufferlib.vector
backend = pufferlib.vector.Serial #or Multiprocessing, Ray
envs = pufferlib.vector.make(nmmo_creator, backend=backend, num_envs=4)
# Synchronous API - reset/step
# obs = envs.reset()[0]
obs, infos = envs.reset()
# Asynchronous API - async_reset, send/recv
envs.async_reset()
obs = envs.recv()[0]
obs, rewards, terminals, truncateds, infos, env_id, mask = envs.recv()
Our backends support asynchronous on-policy sampling through a Python implementation of EnvPool. This makes them *faster* than the implementations that ship with most RL libraries. We suggest Serial for debugging and Multiprocessing for most training runs. Ray is a good option if you need to scale beyond a single machine.

Expand Down Expand Up @@ -156,51 +157,68 @@ PufferLib allows you to write vanilla PyTorch policies and use them with multipl
policy = Policy(envs.driver_env)
cleanrl_policy = pufferlib.frameworks.cleanrl.Policy(policy)
actions = cleanrl_policy.get_action_and_value(obs)[0].numpy()
obs, rewards, terminals, truncateds, infos, env_id, mask = envs.step(actions)
obs, rewards, terminals, truncateds, infos = envs.step(actions)
envs.close()
There's also an optional policy base class for PufferLib. It just breaks the forward pass into an encode and decode step, which allows us to handle recurrance for you. So far, the code above is fully general and does not rely on PufferLib support for specific environments. For convenience, we also provide environment hooks with standard wrappers and baseline models. Here's a complete example.
Optionally, you can class break the forward pass into an encode and decode step, which allows us to handle recurrance for you. So far, the code above is fully general and does not rely on PufferLib support for specific environments. For convenience, we also provide environment hooks with standard wrappers and baseline models. Here's a complete example.

.. code-block:: python
import torch
import pufferlib.models
import pufferlib.vectorization
import pufferlib.vector
import pufferlib.frameworks.cleanrl
import pufferlib.environments.nmmo
envs = pufferlib.vectorization.Multiprocessing(
env_creator=pufferlib.environments.nmmo.make_env,
num_envs=4, envs_per_worker=2)
make_env = pufferlib.environments.nmmo.env_creator()
envs = pufferlib.vector.make(make_env, backend=backend, num_envs=4)
policy = pufferlib.environments.nmmo.Policy(envs.driver_env)
cleanrl_policy = pufferlib.frameworks.cleanrl.Policy(policy)
env_outputs = envs.reset()[0]
obs = torch.Tensor(env_outputs)
obs = torch.from_numpy(env_outputs)
actions = cleanrl_policy.get_action_and_value(obs)[0].numpy()
obs, rewards, terminals, truncateds, infos, env_id, mask = envs.step(actions)
next_obs, rewards, terminals, truncateds, infos = envs.step(actions)
envs.close()
It's that simple -- almost. If you have an environment with structured observations, you'll have to unpack them in the network forward pass since PufferLib will flatten them in emulation. We provide a utility for this.

.. code-block:: python
obs = pufferlib.emulation.unpack_batched_obs(
env_outputs,
envs.driver_env.flat_observation_space,
envs.driver_env.flat_observation_structure
)
dtype = pufferlib.pytorch.nativize_dtype(envs.driver_env.emulated)
env_outputs = pufferlib.pytorch.nativize_tensor(obs, dtype)
print('Packed tensor:', obs.shape)
print('Unpacked:', env_outputs.keys())
That's all you need to get started. The PufferLib repository contains full-length CleanRL scripts with PufferLib integration. Single-agent environments should work with SB3, and other integrations will be based on demand - so let us know what you want!

Vectorization
#############

Our Multiprocessing backend is fast -- much faster than Gymnasium's in most cases. Atari runs 50-60% faster synchronous and 5x faster async by our latest benchmark, and some environments like NetHack can be 10x faster even synchronous, with no API changes. PufferLib implements the following optimizations:

**A Python implementation of EnvPool.** Simulates more envs than are needed per batch and returns batches of observations as soon as they are ready. Requires using the async send/recv instead of the sync step API.

**Multiple environments per worker.** Important for fast environments.

**Shared memory.** Unlike Gymnasium's implementation, we use a single buffer that is shared across environments.

**Shared flags.** Workers busy-wait on an unlocked flag instead of signaling via pipes or queues. This virtually eliminates interprocess communication overhead. Pipes are used once per episode to communicate aggregated infos.

**Zero-copy batching.** Because we use a single buffer for shared memory, we can return observations from contiguous subsets of workers without ever copying observations. Only does not work for full-async mode.

**Native multiagent support.** It's not an extra wrapper or slow bolt-on feature. PufferLib treats single-agent and multi-agent environments the same. API differences are handled at the emulation level.

That's all you need to get started. The PufferLib repository contains full-length CleanRL scripts with PufferLib integration. SB3 and other integrations coming soon!
Most of these optimizations are made possible by a hard assumption on PufferLib emulation. This means that we do not need to handle structured data within the vectorization layer itself.

Libraries
#########

PufferLib's emulation layer adheres to the Gym and PettingZoo APIs: you can use it with *any* environment and learning library (subject to Limitations). The libraries and environments below are just the ones we've tested. We also provide additional tools to make them easier to work with.

PufferLib provides *pufferlib.frameworks* for the the learning libraries below. These are short wrappers over your vanilla PyTorch policy that handles learning library API details for you. Additionally, if you use our *optional* model API, which just requires you to split your *forward* function into an *encode* and *decode* portion, we can handle recurrance for you. This is the approach we use in our default policies.
PufferLib provides *pufferlib.frameworks* for the the learning libraries below. These are short wrappers over your vanilla PyTorch policy that handles learning library API details for you. Additionally, if you split your *forward* function into an *encode* and *decode* portion, we can handle recurrance for you. This is the approach we use in our default policies.

.. raw:: html

Expand All @@ -216,7 +234,7 @@ PufferLib provides *pufferlib.frameworks* for the the learning libraries below.
</div>

.. card::
:link: https://colab.research.google.com/drive/1OMcaJnCAF1UiCJxKIxSS-RdZTuonItYT?usp=sharing
:link: https://colab.research.google.com/drive/1Zj4_vT36VlMsk0JHVx2cxxW27VdeS3mJ?usp=sharing
:width: 75%
:margin: 4 2 auto auto
:text-align: center
Expand Down Expand Up @@ -245,7 +263,7 @@ We have previously supported RLLib and may again in the future. RLlib has not re
Environments
############

PufferLib ships with Ocean, our first-party testing suite. We also provide integrations for many environments out of the box. Non-pip dependencies are already set up for you in PufferTank. Several environments also include reasonable baseline policies. Join our Discord if you would like to add setup and tests for new environments or improvements to any of the baselines.
PufferLib ships with Ocean, our first-party testing suite, which will let you catch 90% of implementation bugs in a 10 second training run. We also provide integrations for many environments out of the box. Non-pip dependencies are already set up for you in PufferTank. Several environments also include reasonable baseline policies. Join our Discord if you would like to add setup and tests for new environments or improvements to any of the baselines.


.. raw:: html
Expand Down Expand Up @@ -396,10 +414,10 @@ PufferLib ships with Ocean, our first-party testing suite. We also provide integ
Current Limitations
###################

- No continuous action spaces (WIP)
- Support for heterogenous observations and actions requires you to specify teams such that each team has the same observation and action space. There's no good way around this.
- No continuous action spaces (planned for after 1.0)
- Each agent must have the same observation and action space. True of most RL libraries, hard to work around without sacrificing performance or simplicity.

License
#######

PufferLib is free and open-source software under the MIT license. This is the full set of tools maintained by PufferAI. Dev branches are public and we do not have private repositories with additional utilities.
PufferLib is free and open-source software under the MIT license.
24 changes: 16 additions & 8 deletions docs/build/html/_sources/rst/ocean.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,62 +2,70 @@

|
🌊 Ocean is PufferLib's suite of first-party environments. They are small and can be trained from scratch in 30 seconds to 2 minutes. Use Ocean as a sanity check for your training code instead of overnighting heavier runs.
🌊 Ocean is PufferLib's suite of first-party environments. They are small and can be trained from scratch in 10 seconds to 2 minutes. Use Ocean as a sanity check for your training code instead of overnighting heavier runs.

.. image:: /resource/ocean.png

Make Functions
**************

.. automodule:: pufferlib.environments.ocean.environment
:members:
:undoc-members:
:noindex:

Squared
*******

.. autoclass:: pufferlib.environments.ocean.squared.Squared
.. autoclass:: pufferlib.environments.ocean.ocean.Squared
:members:
:undoc-members:
:noindex:

Password (exploration environment)
**********************************

.. autoclass:: pufferlib.environments.ocean.password.Password
.. autoclass:: pufferlib.environments.ocean.ocean.Password
:members:
:undoc-members:
:noindex:

Stochastic
**********

.. autoclass:: pufferlib.environments.ocean.stochastic.Stochastic
.. autoclass:: pufferlib.environments.ocean.ocean.Stochastic
:members:
:undoc-members:
:noindex:

Memory
******

.. autoclass:: pufferlib.environments.ocean.memory.Memory
.. autoclass:: pufferlib.environments.ocean.ocean.Memory
:members:
:undoc-members:
:noindex:

Multiagent
**********

.. autoclass:: pufferlib.environments.ocean.multiagent.Multiagent
.. autoclass:: pufferlib.environments.ocean.ocean.Multiagent
:members:
:undoc-members:
:noindex:

Spaces
******

.. autoclass:: pufferlib.environments.ocean.spaces.Spaces
.. autoclass:: pufferlib.environments.ocean.ocean.Spaces
:members:
:undoc-members:
:noindex:

Bandit
******

.. autoclass:: pufferlib.environments.ocean.bandit.Bandit
.. autoclass:: pufferlib.environments.ocean.ocean.Bandit
:members:
:undoc-members:
:noindex:
2 changes: 1 addition & 1 deletion docs/build/html/_static/documentation_options.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
var DOCUMENTATION_OPTIONS = {
URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'),
VERSION: '0.7.0',
VERSION: '1.0.0',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
Expand Down
13 changes: 8 additions & 5 deletions docs/build/html/genindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<meta name="viewport" content="width=device-width,initial-scale=1"/>
<meta name="color-scheme" content="light dark"><link rel="index" title="Index" href="#" /><link rel="search" title="Search" href="search.html" />

<!-- Generated with Sphinx 5.0.0 and Furo 2023.03.27 --><title>Index - PufferLib 0.7.0 documentation</title>
<!-- Generated with Sphinx 5.0.0 and Furo 2023.03.27 --><title>Index - PufferLib 1.0.0 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
<link rel="stylesheet" type="text/css" href="_static/design-style.1e8bd061cd6da7fc9cf755528e8ffc24.min.css" />
Expand Down Expand Up @@ -188,7 +188,7 @@
</label>
</div>
<div class="header-center">
<a href="index.html"><div class="brand">PufferLib 0.7.0 documentation</div></a>
<a href="index.html"><div class="brand">PufferLib 1.0.0 documentation</div></a>
</div>
<div class="header-right">
<div class="theme-toggle-container theme-toggle-header">
Expand All @@ -211,7 +211,7 @@
<div class="sidebar-sticky"><a class="sidebar-brand" href="index.html">


<span class="sidebar-brand-text">PufferLib 0.7.0 documentation</span>
<span class="sidebar-brand-text">PufferLib 1.0.0 documentation</span>

</a><form class="sidebar-search-container" method="get" action="search.html" role="search">
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
Expand All @@ -221,7 +221,9 @@
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
<p class="caption" role="heading"><span class="caption-text">User Guide</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="rst/landing.html">Libraries</a></li>
<li class="toctree-l1"><a class="reference internal" href="rst/landing.html">Emulation</a></li>
<li class="toctree-l1"><a class="reference internal" href="rst/landing.html#vectorization">Vectorization</a></li>
<li class="toctree-l1"><a class="reference internal" href="rst/landing.html#libraries">Libraries</a></li>
<li class="toctree-l1"><a class="reference internal" href="rst/landing.html#environments">Environments</a></li>
<li class="toctree-l1"><a class="reference internal" href="rst/landing.html#current-limitations">Current Limitations</a></li>
<li class="toctree-l1"><a class="reference internal" href="rst/landing.html#license">License</a></li>
Expand All @@ -238,7 +240,8 @@
</ul>
<p class="caption" role="heading"><span class="caption-text">🌊 Ocean</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="rst/ocean.html">Squared</a></li>
<li class="toctree-l1"><a class="reference internal" href="rst/ocean.html">Make Functions</a></li>
<li class="toctree-l1"><a class="reference internal" href="rst/ocean.html#squared">Squared</a></li>
<li class="toctree-l1"><a class="reference internal" href="rst/ocean.html#password-exploration-environment">Password (exploration environment)</a></li>
<li class="toctree-l1"><a class="reference internal" href="rst/ocean.html#stochastic">Stochastic</a></li>
<li class="toctree-l1"><a class="reference internal" href="rst/ocean.html#memory">Memory</a></li>
Expand Down
Loading

0 comments on commit 4dbbf04

Please sign in to comment.