diff --git a/docs/build/doctrees/environment.pickle b/docs/build/doctrees/environment.pickle
index 579174b..1cc018d 100644
Binary files a/docs/build/doctrees/environment.pickle and b/docs/build/doctrees/environment.pickle differ
diff --git a/docs/build/doctrees/rst/blog.doctree b/docs/build/doctrees/rst/blog.doctree
index 14d9e8c..9b36338 100644
Binary files a/docs/build/doctrees/rst/blog.doctree and b/docs/build/doctrees/rst/blog.doctree differ
diff --git a/docs/build/doctrees/rst/ocean.doctree b/docs/build/doctrees/rst/ocean.doctree
index 7413c8d..49eecd1 100644
Binary files a/docs/build/doctrees/rst/ocean.doctree and b/docs/build/doctrees/rst/ocean.doctree differ
diff --git a/docs/build/html/_images/ocean.png b/docs/build/html/_images/ocean.png
new file mode 100644
index 0000000..2eff103
Binary files /dev/null and b/docs/build/html/_images/ocean.png differ
diff --git a/docs/build/html/_sources/rst/blog.rst.txt b/docs/build/html/_sources/rst/blog.rst.txt
index cfea71b..08f602d 100644
--- a/docs/build/html/_sources/rst/blog.rst.txt
+++ b/docs/build/html/_sources/rst/blog.rst.txt
@@ -11,6 +11,39 @@
      </video>
    </center>
 
+🐡🌊 An Ocean of Environments for Learning Pufferfish
+#####################################################
+
+Ocean is a small suite of environments that train from scratch in 30 seconds and render in a terminal. Each environment is a sanity check for a common implementation bug. Use Ocean as a quick verification test whenever you make small code changes.
+
+.. image:: ../resource/ocean.png
+   :width: 100%
+   :align: center
+
+**Memory:** The agent is shown one binary token at a time and must recite them back after a pause. Do not make the sequence too long or you start testing credit assignment.
+
+**Stochasticity:** The agent is rewarded for learning a particular nondeterministic action distribution. Do not use an architecture with memory or the agent can solve the task without stochasticity.
+
+**Exploration:** The agent is rewarded for guessing a specific binary sequence. Do not tune your entropy coefficients higher than you would use in your actual environments, since that is the point of the test.
+
+**Bandit:** The agent is rewarded for solving a multiarmed bandit problem. This environment is included for historical importance. Any reasonable implementation should solve the default setting.
+
+**Squared:** The agent is rewarded for moving to targets that spawn around the edges of a square. There are settings to test memory, exploration, and stochasticity separately or jointly to help you prod at deeper issues with your implementation.
+
+This project is heavily inspired by BSuite, a DeepMind project with similar if more benchmarky goals. BSuite was a bit too heavy for my liking and didn’t fit the niche of a quick and portable verification suite.
+
+I had a few issues designing these. The memory task is apparently a standard RNN copying task (I would be surprised if it weren’t). But it’s a bit different in an RL context because you still have to learn credit assignment. I don’t think there is a way to fully isolate learning only memory outside of a simple 1-step problem. Try increasing the memory sequence length or delay and you will quickly find that the problem gets harder to learn.
+
+The exploration environment is the only one that just worked. You can increase the password length and the problem gets harder to learn at about the rate you would expect. It’s just a guess and check, so once you happen to get the password right once, the goal is to learn from that single instance as much as possible. Any prioritized replay would trivialize the problem.
+
+The stochastic environment took the longest. Initially, I was looking for one where the optimal policy was still stochastic and nontrivial even if the agent had memory. I could not figure out how to make one of these, and Twitter seems to think it’s impossible. They’re probably right, though you might be able to alter the setup conditions a bit, still test for the same thing, and have something that works better. For now, this is a quick and consistent test.
+
+I wrote the bandit environment earlier in the project, and it seems kind of useful, so I left it in the release. Probably a good idea to have at least some version of a problem this historically important easily accessible in PufferLib.
+
+I wrote Squared over the summer. I’m rather fond of it as a test environment, since it is fairly scalable. You spawn at the center of a square and targets spawn around the outside. You get a reward the first time you hit each target and are teleported to the center whenever you hit a target. This means that the optimal policy is stochastic: you place equal probability on moving towards each target and then deterministically move towards the target you have selected. It’s interesting because the optimal policy is stochastic in some states and deterministic in others. You can also turn the problem into a memory test by using a recurrent network. In any event, it’s similar to the bandit problem in that it combines elements of the simpler tests, but it’s a bit more tunable and interpretable.
+
+Let me know if you have other ideas for useful test environments. Lately, I’ve landed on either very simple or very complex environments as being the most useful for research. Many of the tasks in the middle (looking at you Atari) are too slow to be useful as quick tests and too simple to test interesting ideas.
+
 PufferLib 0.5: A Bigger EnvPool for Growing Puffers
 ###################################################
 
diff --git a/docs/build/html/_sources/rst/ocean.rst.txt b/docs/build/html/_sources/rst/ocean.rst.txt
index 41746b0..3277f1c 100644
--- a/docs/build/html/_sources/rst/ocean.rst.txt
+++ b/docs/build/html/_sources/rst/ocean.rst.txt
@@ -4,6 +4,8 @@
 
 🌊 Ocean is PufferLib's suite of first-party environments. They are small and can be trained from scratch in 30 seconds to 2 minutes. Use Ocean as a sanity check for your training code instead of overnighting heavier runs.
 
+.. image:: /resource/ocean.png
+
 Squared
 *******
 
diff --git a/docs/build/html/genindex.html b/docs/build/html/genindex.html
index fae9b89..bbb6059 100644
--- a/docs/build/html/genindex.html
+++ b/docs/build/html/genindex.html
@@ -246,7 +246,8 @@
 </ul>
 <p class="caption" role="heading"><span class="caption-text">Blog</span></p>
 <ul>
-<li class="toctree-l1"><a class="reference internal" href="rst/blog.html">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rst/blog.html">🐡🌊 An Ocean of Environments for Learning Pufferfish</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rst/blog.html#pufferlib-0-5-a-bigger-envpool-for-growing-puffers">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/blog.html#pufferlib-0-4-ready-to-take-on-bigger-fish">PufferLib 0.4: Ready to Take on Bigger Fish</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/blog.html#pufferlib-0-2-ready-to-take-on-the-big-fish">PufferLib 0.2: Ready to Take on the Big Fish</a></li>
 </ul>
diff --git a/docs/build/html/index.html b/docs/build/html/index.html
index 5eb036c..406b1f0 100644
--- a/docs/build/html/index.html
+++ b/docs/build/html/index.html
@@ -248,7 +248,8 @@
 </ul>
 <p class="caption" role="heading"><span class="caption-text">Blog</span></p>
 <ul>
-<li class="toctree-l1"><a class="reference internal" href="rst/blog.html">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rst/blog.html">🐡🌊 An Ocean of Environments for Learning Pufferfish</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rst/blog.html#pufferlib-0-5-a-bigger-envpool-for-growing-puffers">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/blog.html#pufferlib-0-4-ready-to-take-on-bigger-fish">PufferLib 0.4: Ready to Take on Bigger Fish</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/blog.html#pufferlib-0-2-ready-to-take-on-the-big-fish">PufferLib 0.2: Ready to Take on the Big Fish</a></li>
 </ul>
@@ -321,7 +322,8 @@ <h1>Index<a class="headerlink" href="#index" title="Permalink to this heading">#
 <div class="toctree-wrapper compound">
 <p class="caption" role="heading"><span class="caption-text">Blog</span></p>
 <ul>
-<li class="toctree-l1"><a class="reference internal" href="rst/blog.html">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a><ul>
+<li class="toctree-l1"><a class="reference internal" href="rst/blog.html">🐡🌊 An Ocean of Environments for Learning Pufferfish</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rst/blog.html#pufferlib-0-5-a-bigger-envpool-for-growing-puffers">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a><ul>
 <li class="toctree-l2"><a class="reference internal" href="rst/blog.html#the-simulation-crisis">The Simulation Crisis</a></li>
 <li class="toctree-l2"><a class="reference internal" href="rst/blog.html#the-solution">The Solution</a></li>
 <li class="toctree-l2"><a class="reference internal" href="rst/blog.html#experiments">Experiments</a></li>
diff --git a/docs/build/html/objects.inv b/docs/build/html/objects.inv
index 72668b3..319581f 100644
Binary files a/docs/build/html/objects.inv and b/docs/build/html/objects.inv differ
diff --git a/docs/build/html/rst/api.html b/docs/build/html/rst/api.html
index c22931e..9f8768e 100644
--- a/docs/build/html/rst/api.html
+++ b/docs/build/html/rst/api.html
@@ -248,7 +248,8 @@
 </ul>
 <p class="caption" role="heading"><span class="caption-text">Blog</span></p>
 <ul>
-<li class="toctree-l1"><a class="reference internal" href="blog.html">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a></li>
+<li class="toctree-l1"><a class="reference internal" href="blog.html">🐡🌊 An Ocean of Environments for Learning Pufferfish</a></li>
+<li class="toctree-l1"><a class="reference internal" href="blog.html#pufferlib-0-5-a-bigger-envpool-for-growing-puffers">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a></li>
 <li class="toctree-l1"><a class="reference internal" href="blog.html#pufferlib-0-4-ready-to-take-on-bigger-fish">PufferLib 0.4: Ready to Take on Bigger Fish</a></li>
 <li class="toctree-l1"><a class="reference internal" href="blog.html#pufferlib-0-2-ready-to-take-on-the-big-fish">PufferLib 0.2: Ready to Take on the Big Fish</a></li>
 </ul>
diff --git a/docs/build/html/rst/blog.html b/docs/build/html/rst/blog.html
index bc49b9b..42dfcfc 100644
--- a/docs/build/html/rst/blog.html
+++ b/docs/build/html/rst/blog.html
@@ -6,7 +6,7 @@
 <link rel="index" title="Index" href="../genindex.html" /><link rel="search" title="Search" href="../search.html" /><link rel="prev" title="Squared" href="ocean.html" />
 
     <!-- Generated with Sphinx 5.0.0 and Furo 2023.03.27 -->
-        <title>PufferLib 0.5: A Bigger EnvPool for Growing Puffers - PufferLib 0.6.0 documentation</title>
+        <title>🐡🌊 An Ocean of Environments for Learning Pufferfish - PufferLib 0.6.0 documentation</title>
       <link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
     <link rel="stylesheet" type="text/css" href="../_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
     <link rel="stylesheet" type="text/css" href="../_static/design-style.1e8bd061cd6da7fc9cf755528e8ffc24.min.css" />
@@ -201,7 +201,7 @@
           <svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
         </button>
       </div>
-      <label class="toc-overlay-icon toc-header-icon" for="__toc">
+      <label class="toc-overlay-icon toc-header-icon no-toc" for="__toc">
         <div class="visually-hidden">Toggle table of contents sidebar</div>
         <i class="icon"><svg><use href="#svg-toc"></use></svg></i>
       </label>
@@ -248,7 +248,8 @@
 </ul>
 <p class="caption" role="heading"><span class="caption-text">Blog</span></p>
 <ul class="current">
-<li class="toctree-l1 current current-page"><a class="current reference internal" href="#">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a></li>
+<li class="toctree-l1 current current-page"><a class="current reference internal" href="#">🐡🌊 An Ocean of Environments for Learning Pufferfish</a></li>
+<li class="toctree-l1"><a class="reference internal" href="#pufferlib-0-5-a-bigger-envpool-for-growing-puffers">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a></li>
 <li class="toctree-l1"><a class="reference internal" href="#pufferlib-0-4-ready-to-take-on-bigger-fish">PufferLib 0.4: Ready to Take on Bigger Fish</a></li>
 <li class="toctree-l1"><a class="reference internal" href="#pufferlib-0-2-ready-to-take-on-the-big-fish">PufferLib 0.2: Ready to Take on the Big Fish</a></li>
 </ul>
@@ -279,7 +280,7 @@
               <svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
             </button>
           </div>
-          <label class="toc-overlay-icon toc-content-icon" for="__toc">
+          <label class="toc-overlay-icon toc-content-icon no-toc" for="__toc">
             <div class="visually-hidden">Toggle table of contents sidebar</div>
             <i class="icon"><svg><use href="#svg-toc"></use></svg></i>
           </label>
@@ -291,7 +292,24 @@
     <source src="../_static/banner.mp4" type="video/mp4">
     Your browser does not support this video.
   </video>
-</center><section id="pufferlib-0-5-a-bigger-envpool-for-growing-puffers">
+</center><section id="an-ocean-of-environments-for-learning-pufferfish">
+<h1>🐡🌊 An Ocean of Environments for Learning Pufferfish<a class="headerlink" href="#an-ocean-of-environments-for-learning-pufferfish" title="Permalink to this heading">#</a></h1>
+<p>Ocean is a small suite of environments that train from scratch in 30 seconds and render in a terminal. Each environment is a sanity check for a common implementation bug. Use Ocean as a quick verification test whenever you make small code changes.</p>
+<a class="reference internal image-reference" href="../_images/ocean.png"><img alt="../_images/ocean.png" class="align-center" src="../_images/ocean.png" style="width: 100%;" /></a>
+<p><strong>Memory:</strong> The agent is shown one binary token at a time and must recite them back after a pause. Do not make the sequence too long or you start testing credit assignment.</p>
+<p><strong>Stochasticity:</strong> The agent is rewarded for learning a particular nondeterministic action distribution. Do not use an architecture with memory or the agent can solve the task without stochasticity.</p>
+<p><strong>Exploration:</strong> The agent is rewarded for guessing a specific binary sequence. Do not tune your entropy coefficients higher than you would use in your actual environments, since that is the point of the test.</p>
+<p><strong>Bandit:</strong> The agent is rewarded for solving a multiarmed bandit problem. This environment is included for historical importance. Any reasonable implementation should solve the default setting.</p>
+<p><strong>Squared:</strong> The agent is rewarded for moving to targets that spawn around the edges of a square. There are settings to test memory, exploration, and stochasticity separately or jointly to help you prod at deeper issues with your implementation.</p>
+<p>This project is heavily inspired by BSuite, a DeepMind project with similar if more benchmarky goals. BSuite was a bit too heavy for my liking and didn’t fit the niche of a quick and portable verification suite.</p>
+<p>I had a few issues designing these. The memory task is apparently a standard RNN copying task (I would be surprised if it weren’t). But it’s a bit different in an RL context because you still have to learn credit assignment. I don’t think there is a way to fully isolate learning only memory outside of a simple 1-step problem. Try increasing the memory sequence length or delay and you will quickly find that the problem gets harder to learn.</p>
+<p>The exploration environment is the only one that just worked. You can increase the password length and the problem gets harder to learn at about the rate you would expect. It’s just a guess and check, so once you happen to get the password right once, the goal is to learn from that single instance as much as possible. Any prioritized replay would trivialize the problem.</p>
+<p>The stochastic environment took the longest. Initially, I was looking for one where the optimal policy was still stochastic and nontrivial even if the agent had memory. I could not figure out how to make one of these, and Twitter seems to think it’s impossible. They’re probably right, though you might be able to alter the setup conditions a bit, still test for the same thing, and have something that works better. For now, this is a quick and consistent test.</p>
+<p>I wrote the bandit environment earlier in the project, and it seems kind of useful, so I left it in the release. Probably a good idea to have at least some version of a problem this historically important easily accessible in PufferLib.</p>
+<p>I wrote Squared over the summer. I’m rather fond of it as a test environment, since it is fairly scalable. You spawn at the center of a square and targets spawn around the outside. You get a reward the first time you hit each target and are teleported to the center whenever you hit a target. This means that the optimal policy is stochastic: you place equal probability on moving towards each target and then deterministically move towards the target you have selected. It’s interesting because the optimal policy is stochastic in some states and deterministic in others. You can also turn the problem into a memory test by using a recurrent network. In any event, it’s similar to the bandit problem in that it combines elements of the simpler tests, but it’s a bit more tunable and interpretable.</p>
+<p>Let me know if you have other ideas for useful test environments. Lately, I’ve landed on either very simple or very complex environments as being the most useful for research. Many of the tasks in the middle (looking at you Atari) are too slow to be useful as quick tests and too simple to test interesting ideas.</p>
+</section>
+<section id="pufferlib-0-5-a-bigger-envpool-for-growing-puffers">
 <h1>PufferLib 0.5: A Bigger EnvPool for Growing Puffers<a class="headerlink" href="#pufferlib-0-5-a-bigger-envpool-for-growing-puffers" title="Permalink to this heading">#</a></h1>
 <p>This is what reinforcement learning does to your CPU utilization:</p>
 <figure class="align-default">
@@ -502,47 +520,8 @@ <h2>Next Steps<a class="headerlink" href="#next-steps" title="Permalink to this
         
       </footer>
     </div>
-    <aside class="toc-drawer">
-      
+    <aside class="toc-drawer no-toc">
       
-      <div class="toc-sticky toc-scroll">
-        <div class="toc-title-container">
-          <span class="toc-title">
-            On this page
-          </span>
-        </div>
-        <div class="toc-tree-container">
-          <div class="toc-tree">
-            <ul>
-<li><a class="reference internal" href="#">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a><ul>
-<li><a class="reference internal" href="#the-simulation-crisis">The Simulation Crisis</a></li>
-<li><a class="reference internal" href="#the-solution">The Solution</a></li>
-<li><a class="reference internal" href="#experiments">Experiments</a></li>
-<li><a class="reference internal" href="#technical-details-and-gotchas">Technical Details and Gotchas</a></li>
-</ul>
-</li>
-<li><a class="reference internal" href="#pufferlib-0-4-ready-to-take-on-bigger-fish">PufferLib 0.4: Ready to Take on Bigger Fish</a><ul>
-<li><a class="reference internal" href="#emulation">Emulation</a></li>
-<li><a class="reference internal" href="#vectorization">Vectorization</a></li>
-<li><a class="reference internal" href="#puffertank">PufferTank</a></li>
-<li><a class="reference internal" href="#policies">Policies</a></li>
-<li><a class="reference internal" href="#error-handling">Error Handling</a></li>
-<li><a class="reference internal" href="#miscellaneous">Miscellaneous</a></li>
-</ul>
-</li>
-<li><a class="reference internal" href="#pufferlib-0-2-ready-to-take-on-the-big-fish">PufferLib 0.2: Ready to Take on the Big Fish</a><ul>
-<li><a class="reference internal" href="#problem-statement">Problem Statement</a></li>
-<li><a class="reference internal" href="#cleanrl-demos">CleanRL Demos</a></li>
-<li><a class="reference internal" href="#pufferlib-emulation">PufferLib Emulation</a></li>
-<li><a class="reference internal" href="#pufferlib-vectorization">PufferLib Vectorization</a></li>
-<li><a class="reference internal" href="#next-steps">Next Steps</a></li>
-</ul>
-</li>
-</ul>
-
-          </div>
-        </div>
-      </div>
       
       
     </aside>
diff --git a/docs/build/html/rst/landing.html b/docs/build/html/rst/landing.html
index 91906d4..ac3fba9 100644
--- a/docs/build/html/rst/landing.html
+++ b/docs/build/html/rst/landing.html
@@ -248,7 +248,8 @@
 </ul>
 <p class="caption" role="heading"><span class="caption-text">Blog</span></p>
 <ul>
-<li class="toctree-l1"><a class="reference internal" href="blog.html">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a></li>
+<li class="toctree-l1"><a class="reference internal" href="blog.html">🐡🌊 An Ocean of Environments for Learning Pufferfish</a></li>
+<li class="toctree-l1"><a class="reference internal" href="blog.html#pufferlib-0-5-a-bigger-envpool-for-growing-puffers">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a></li>
 <li class="toctree-l1"><a class="reference internal" href="blog.html#pufferlib-0-4-ready-to-take-on-bigger-fish">PufferLib 0.4: Ready to Take on Bigger Fish</a></li>
 <li class="toctree-l1"><a class="reference internal" href="blog.html#pufferlib-0-2-ready-to-take-on-the-big-fish">PufferLib 0.2: Ready to Take on the Big Fish</a></li>
 </ul>
diff --git a/docs/build/html/rst/ocean.html b/docs/build/html/rst/ocean.html
index 419d836..56f4e7a 100644
--- a/docs/build/html/rst/ocean.html
+++ b/docs/build/html/rst/ocean.html
@@ -3,7 +3,7 @@
   <head><meta charset="utf-8"/>
     <meta name="viewport" content="width=device-width,initial-scale=1"/>
     <meta name="color-scheme" content="light dark"><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />
-<link rel="index" title="Index" href="../genindex.html" /><link rel="search" title="Search" href="../search.html" /><link rel="next" title="PufferLib 0.5: A Bigger EnvPool for Growing Puffers" href="blog.html" /><link rel="prev" title="Emulation" href="api.html" />
+<link rel="index" title="Index" href="../genindex.html" /><link rel="search" title="Search" href="../search.html" /><link rel="next" title="🐡🌊 An Ocean of Environments for Learning Pufferfish" href="blog.html" /><link rel="prev" title="Emulation" href="api.html" />
 
     <!-- Generated with Sphinx 5.0.0 and Furo 2023.03.27 -->
         <title>Squared - PufferLib 0.6.0 documentation</title>
@@ -248,7 +248,8 @@
 </ul>
 <p class="caption" role="heading"><span class="caption-text">Blog</span></p>
 <ul>
-<li class="toctree-l1"><a class="reference internal" href="blog.html">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a></li>
+<li class="toctree-l1"><a class="reference internal" href="blog.html">🐡🌊 An Ocean of Environments for Learning Pufferfish</a></li>
+<li class="toctree-l1"><a class="reference internal" href="blog.html#pufferlib-0-5-a-bigger-envpool-for-growing-puffers">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a></li>
 <li class="toctree-l1"><a class="reference internal" href="blog.html#pufferlib-0-4-ready-to-take-on-bigger-fish">PufferLib 0.4: Ready to Take on Bigger Fish</a></li>
 <li class="toctree-l1"><a class="reference internal" href="blog.html#pufferlib-0-2-ready-to-take-on-the-big-fish">PufferLib 0.2: Ready to Take on the Big Fish</a></li>
 </ul>
@@ -290,6 +291,7 @@
 <div class="line"><br /></div>
 </div>
 <p>🌊 Ocean is PufferLib’s suite of first-party environments. They are small and can be trained from scratch in 30 seconds to 2 minutes. Use Ocean as a sanity check for your training code instead of overnighting heavier runs.</p>
+<img alt="../_images/ocean.png" src="../_images/ocean.png" />
 <section id="squared">
 <h1>Squared<a class="headerlink" href="#squared" title="Permalink to this heading">#</a></h1>
 <dl class="py class">
@@ -562,7 +564,7 @@ <h1>Bandit<a class="headerlink" href="#bandit" title="Permalink to this heading"
                 <div class="context">
                   <span>Next</span>
                 </div>
-                <div class="title">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</div>
+                <div class="title">🐡🌊 An Ocean of Environments for Learning Pufferfish</div>
               </div>
               <svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
             </a>
diff --git a/docs/build/html/search.html b/docs/build/html/search.html
index 8e7fd86..7d8dd8a 100644
--- a/docs/build/html/search.html
+++ b/docs/build/html/search.html
@@ -245,7 +245,8 @@
 </ul>
 <p class="caption" role="heading"><span class="caption-text">Blog</span></p>
 <ul>
-<li class="toctree-l1"><a class="reference internal" href="rst/blog.html">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rst/blog.html">🐡🌊 An Ocean of Environments for Learning Pufferfish</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rst/blog.html#pufferlib-0-5-a-bigger-envpool-for-growing-puffers">PufferLib 0.5: A Bigger EnvPool for Growing Puffers</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/blog.html#pufferlib-0-4-ready-to-take-on-bigger-fish">PufferLib 0.4: Ready to Take on Bigger Fish</a></li>
 <li class="toctree-l1"><a class="reference internal" href="rst/blog.html#pufferlib-0-2-ready-to-take-on-the-big-fish">PufferLib 0.2: Ready to Take on the Big Fish</a></li>
 </ul>
diff --git a/docs/build/html/searchindex.js b/docs/build/html/searchindex.js
index b5417f6..27663e1 100644
--- a/docs/build/html/searchindex.js
+++ b/docs/build/html/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"docnames": ["index", "rst/api", "rst/blog", "rst/landing", "rst/ocean"], "filenames": ["index.rst", "rst/api.rst", "rst/blog.rst", "rst/landing.rst", "rst/ocean.rst"], "titles": ["Index", "Emulation", "PufferLib 0.5: A Bigger EnvPool for Growing Puffers", "Libraries", "Squared"], "terms": {"librari": [0, 2], "environ": [0, 2], "current": [0, 1, 2, 4], "limit": [0, 2], "licens": 0, "emul": [0, 3], "model": [0, 2, 3], "vector": [0, 3, 4], "cleanrl": [0, 3], "integr": [0, 2, 3], "sb3": [0, 3], "bind": [0, 2], "rllib": [0, 2, 3], "squar": [0, 1, 2], "password": 0, "explor": 0, "stochast": 0, "memori": 0, "bandit": 0, "pufferlib": [0, 1, 3, 4], "0": [0, 3, 4], "5": [0, 4], "A": [0, 1, 4], "bigger": 0, "envpool": [0, 3], "grow": 0, "puffer": [0, 3], "The": [0, 1, 3, 4], "simul": [0, 1, 3, 4], "crisi": 0, "solut": 0, "experi": [0, 3], "technic": 0, "detail": [0, 3], "gotcha": 0, "4": [0, 1, 3, 4], "readi": 0, "take": [0, 1, 3], "fish": 0, "puffertank": [0, 3], "polici": [0, 1, 3, 4], "error": 0, "handl": [0, 1, 3], "miscellan": 0, "2": [0, 3, 4], "big": 0, "problem": [0, 4], "statement": 0, "demo": [0, 1, 3], "next": [0, 4], "step": [0, 1, 3, 4], "our": [1, 2, 3], "public": [1, 2, 3], "api": [1, 2, 3], "advanc": 1, "user": 1, "can": [1, 2, 3, 4], "check": [1, 2, 4], "sourc": [1, 2, 3], "addit": [1, 2, 3], "util": [1, 2, 3], "note": [1, 3], "we": [1, 2, 3], "tend": 1, "move": [1, 4], "around": [1, 2, 3], "more": [1, 2], "often": [1, 4], "contribut": [1, 2], "welcom": 1, "wrap": [1, 2, 3], "your": [1, 2, 3, 4], "broad": 1, "compat": [1, 2], "support": [1, 2, 3], "pass": [1, 2, 3], "creator": [1, 3], "function": [1, 2, 3], "class": [1, 2, 3, 4], "env": [1, 2, 3], "object": [1, 2, 3], "return": [1, 2, 3], "pufferenv": [1, 2], "same": [1, 2, 3, 4], "gym": [1, 2, 3], "pettingzoo": [1, 2, 3], "gymnasiumpufferenv": [1, 3], "none": [1, 4], "env_creat": [1, 2, 3], "env_arg": 1, "env_kwarg": 1, "postprocessor_cl": 1, "basicpostprocessor": 1, "properti": 1, "observation_spac": [1, 4], "flatten": [1, 2, 3], "singl": [1, 2, 3], "tensor": [1, 2, 3], "observ": [1, 2, 3, 4], "space": [1, 2, 3, 4], "action_spac": [1, 4], "multi": [1, 3], "discret": [1, 2, 4], "action": [1, 2, 3, 4], "seed": [1, 4], "reset": [1, 2, 3, 4], "execut": [1, 3], "an": [1, 2, 3], "reward": [1, 2, 3, 4], "done": [1, 2], "info": [1, 2, 3], "render": [1, 4], "close": [1, 3, 4], "unpack_batched_ob": [1, 3], "batched_ob": 1, "pettingzoopufferenv": [1, 2, 3], "postprocessor": 1, "postprocessor_kwarg": 1, "team": [1, 2, 3], "agent": [1, 2, 3, 4], "single_observation_spac": [1, 3], "single_action_spac": [1, 3], "all": [1, 2, 3, 4], "includ": [1, 2, 3], "expos": 1, "make_env": [1, 3], "one": [1, 2, 3, 4], "you": [1, 2, 3, 4], "want": [1, 2, 3], "most": [1, 2, 3], "time": [1, 2, 3, 4], "other": [1, 2, 3], "us": [1, 2, 3, 4], "e": 1, "g": 1, "interfac": [1, 2, 3], "them": [1, 2, 3], "so": [1, 2, 3, 4], "static": [1, 4], "refer": 1, "addition": [1, 2, 3], "baselin": [1, 2, 3], "have": [1, 2, 3, 4], "custom": [1, 2, 3], "default": [1, 2, 3], "simpli": [1, 2], "befor": [1, 2, 3, 4], "appli": [1, 2], "linear": [1, 2, 3], "layer": [1, 2, 3], "atari": [1, 2, 3], "procgen": [1, 3], "neural": [1, 2, 3], "mmo": [1, 2, 3], "nethack": [1, 3], "minihack": [1, 3], "pokemon": [1, 3], "red": [1, 3], "reason": [1, 3], "exampl": [1, 2, 3], "below": [1, 2, 3], "everyth": [1, 3], "through": [1, 3], "__init__": [1, 2, 3], "call": [1, 2], "method": [1, 2], "ocean": [1, 3, 4], "distance_to_target": [1, 4], "1": [1, 2, 3, 4], "num_target": [1, 4], "start": [1, 2, 3, 4], "center": [1, 4], "grid": [1, 3, 4], "target": [1, 3, 4], "ar": [1, 2, 3, 4], "place": [1, 4], "perimet": [1, 4], "minu": [1, 4], "l": [1, 4], "inf": [1, 4], "distanc": [1, 4], "closest": [1, 4], "thi": [1, 2, 3, 4], "mean": [1, 2, 4], "vari": [1, 4], "from": [1, 2, 3, 4], "given": [1, 4], "alreadi": [1, 3, 4], "been": [1, 2, 4], "hit": [1, 4], "box": [1, 2, 3, 4], "grid_siz": [1, 4], "map": [1, 2, 4], "8": [1, 4], "which": [1, 2, 3, 4], "direct": [1, 2, 4], "paramet": [1, 2, 4], "number": [1, 2, 3, 4], "randomli": [1, 4], "gener": [1, 2, 3, 4], "acttyp": [1, 4], "obstyp": [1, 4], "torch": [1, 3], "alia": 1, "option": [1, 2, 3], "These": [1, 2, 3], "requir": [1, 2, 3, 4], "arg": 1, "ani": [1, 2, 3], "kwarg": 1, "pure": 1, "pytorch": [1, 2, 3], "base": [1, 2, 3], "spec": [1, 2], "allow": [1, 2, 3], "repackag": 1, "rl": [1, 2, 3], "framework": [1, 2, 3], "encode_observ": 1, "decode_act": 1, "s": [1, 2, 3, 4], "equival": [1, 2], "forward": [1, 2, 3], "structur": [1, 2, 3], "provid": [1, 2, 3], "flexibl": [1, 3], "lstm": 1, "between": [1, 2, 3, 4], "encod": [1, 3], "decod": [1, 3], "To": [1, 2, 3], "port": [1, 3], "put": 1, "recurr": [1, 2, 3], "cell": 1, "head": 1, "after": [1, 4], "delet": 1, "its": 1, "specif": [1, 2, 3], "wrapper": [1, 2, 3], "sinc": [1, 2, 3], "each": [1, 2, 3], "treat": 1, "tempor": 1, "data": [1, 2], "bit": [1, 2, 3], "differ": [1, 2, 3], "approach": [1, 2, 3], "let": [1, 2], "write": [1, 3], "network": [1, 3], "multipl": [1, 2, 3], "specifi": [1, 2, 3], "valu": [1, 3], "critic": 1, "batch": [1, 2], "element": 1, "It": [1, 2, 3, 4], "output": [1, 2, 4], "abstract": 1, "flat_observ": 1, "hidden": [1, 3], "state": [1, 2], "unflatten": [1, 2], "origin": [1, 2, 3], "form": 1, "self": [1, 2, 3], "structured_observation_spac": 1, "env_output": [1, 3], "shape": [1, 2, 3], "obs_siz": 1, "hidden_s": 1, "lookup": 1, "embed": 1, "type": 1, "flat_hidden": 1, "multidiscret": [1, 2], "action_s": 1, "concaten": 1, "logit": 1, "dimens": 1, "should": [1, 4], "sum": [1, 2], "nvec": [1, 3], "recurrentwrapp": 1, "x": 1, "meant": 1, "debug": [1, 2, 3], "run": [1, 2, 3, 4], "unlik": 1, "learn": [1, 2, 3, 4], "anyth": [1, 3], "relu": 1, "list": [1, 2, 3], "concat": 1, "true": [1, 2, 3], "convolut": 1, "stack": [1, 2], "three": 1, "follow": [1, 3, 4], "framestack": 1, "mandatori": 1, "keyword": 1, "argument": [1, 2, 3], "suggest": [1, 2, 3], "frame": 1, "without": [1, 2, 3, 4], "distribut": [1, 3], "backend": [1, 2, 3], "serial": [1, 2, 3], "callabl": 1, "dict": 1, "num_env": [1, 3], "int": 1, "envs_per_work": [1, 2, 3], "envs_per_batch": 1, "env_pool": [1, 3], "bool": 1, "fals": [1, 2, 3], "main": [1, 2], "process": [1, 2], "modul": [1, 2, 3], "flat_observation_spac": [1, 3], "ob": [1, 2, 3], "send": [1, 3], "recv": [1, 2, 3], "async_reset": [1, 2, 3], "profil": [1, 2], "get": [1, 2, 3, 4], "multiprocess": [1, 2, 3], "parallel": [1, 2], "applic": 1, "rai": [1, 2, 3], "cluster": 1, "also": [1, 2, 3, 4], "faster": [1, 2, 3], "than": [1, 2, 3], "machin": [1, 2, 3], "non": [1, 3], "get_valu": 1, "get_action_and_valu": [1, 3], "add": [1, 2, 3], "pretti": [1, 2], "simpl": [1, 2, 3], "see": [1, 2, 3], "recurrentpolici": 1, "minim": [1, 2], "cnn": 1, "py": 1, "shelv": 1, "until": 1, "stabl": [1, 2], "browser": [2, 3], "doe": [2, 3, 4], "video": [2, 3], "what": [2, 3], "reinforc": [2, 3], "cpu": 2, "wouldn": 2, "t": [2, 3], "pack": 2, "wai": [2, 3], "right": 2, "With": 2, "releas": [2, 3], "python": [2, 3], "implement": [2, 3], "solv": 2, "tl": 2, "dr": 2, "20": 2, "perform": 2, "improv": [2, 3], "across": 2, "workload": 2, "up": [2, 3], "2x": 2, "complex": [2, 3], "nativ": 2, "multiag": [2, 3], "If": [2, 3], "just": [2, 3], "enhanc": 2, "pip": [2, 3], "instal": [2, 3], "u": 2, "But": 2, "d": 2, "like": [2, 3], "behind": 2, "curtain": 2, "read": 2, "do": [2, 3, 4], "some": [2, 3], "research": [2, 3], "sai": 2, "1000": [2, 4], "second": [2, 4], "core": 2, "5000": 2, "6": 2, "now": 2, "decid": 2, "work": [2, 3], "interest": 2, "happen": [2, 4], "upon": [2, 4], "brilliant": 2, "project": [2, 3], "must": 2, "develop": [2, 3], "truli": 2, "fantast": 2, "1500": 2, "scale": [2, 3, 4], "1800": 2, "per": [2, 3], "give": [2, 3], "speed": 2, "even": 2, "thei": [2, 3, 4], "did": 2, "mani": [2, 3], "modern": 2, "overhead": 2, "mostli": 2, "launch": 2, "synchron": [2, 3], "roughli": 2, "ms": 2, "At": 2, "100": 2, "ignor": 2, "10": 2, "000": 2, "factor": 2, "varianc": 2, "defin": 2, "ratio": 2, "mu": 2, "std": 2, "root": 2, "For": [2, 3], "24": 2, "300": 2, "especi": 2, "intel": 2, "desktop": [2, 3], "seri": 2, "processor": 2, "featur": [2, 3], "slower": 2, "latenc": 2, "taken": 2, "transfer": 2, "gpu": [2, 3], "part": 2, "multiprocesss": 2, "naiv": 2, "leav": 2, "idl": 2, "dure": 2, "infer": 2, "As": 2, "rule": 2, "thumb": 2, "becaus": 2, "code": [2, 3, 4], "alwai": [2, 4], "thing": 2, "ones": [2, 3], "variabl": [2, 3], "depend": [2, 3], "On": 2, "hand": 2, "ha": [2, 3, 4], "effect": 2, "100x": 2, "fewer": 2, "lower": 2, "2000": 2, "sp": 2, "reduc": 2, "impact": 2, "over": [2, 3], "sampl": [2, 3], "In": 2, "typic": 2, "enabl": [2, 3], "onli": 2, "interact": 2, "optim": [2, 3, 4], "buffer": 2, "while": [2, 3], "techniqu": 2, "wa": [2, 3], "introduc": 2, "http": 2, "github": [2, 3], "com": 2, "alex": 2, "petrenko": 2, "factori": 2, "interleav": 2, "two": [2, 3], "set": [2, 3], "good": [2, 3], "trick": 2, "supersed": 2, "final": 2, "simpler": 2, "pool": [2, 3], "first": [2, 3, 4], "finish": 2, "might": 2, "64": 2, "comput": [2, 3], "go": 2, "still": [2, 3], "select": [2, 4], "fastest": 2, "40": 2, "sail": 2, "sg": 2, "massiv": [2, 3], "c": 2, "arbitrari": 2, "evalu": 2, "i": 2, "am": 2, "13900k": 2, "max": 2, "maingear": 2, "debian": 2, "12": 2, "test": [2, 3, 4], "9": 2, "1e": 2, "delai": [2, 4], "spawn": 2, "96": 2, "192": 2, "gymnasium": [2, 3], "separ": 2, "post": 2, "sweep": 2, "consid": 2, "yield": 2, "anoth": 2, "rel": [2, 3], "group": 2, "bar": 2, "gymasium": 2, "match": 2, "case": 2, "best": 2, "high": [2, 3], "standard": [2, 3, 4], "deviat": [2, 4], "import": [2, 3], "instanc": [2, 4], "mai": [2, 3], "copi": 2, "heavi": 2, "again": [2, 3], "save": 2, "dip": 2, "50": 2, "make": [2, 3, 4], "absolut": 2, "sure": 2, "unavoid": 2, "reimplement": 2, "entir": 2, "architectur": [2, 3], "possibl": 2, "10734": 2, "36": 2, "delay_mean": 2, "delay_std": 2, "num_work": 2, "batch_siz": 2, "sync": 2, "11640": 2, "42": [2, 4], "32715": 2, "65": 2, "27635": 2, "31": 2, "22681": 2, "48": 2, "26183": 2, "73": 2, "30120": 2, "75": [2, 4], "turn": 2, "out": [2, 3], "cap": 2, "worker": [2, 3], "There": [2, 3], "room": 2, "small": [2, 4], "matter": 2, "much": [2, 3], "extrem": [2, 3], "concis": 2, "800": 2, "line": [2, 3], "ad": 2, "chang": [2, 3], "lot": 2, "don": [2, 3], "queue": 2, "fast": 2, "poll": 2, "instead": [2, 4], "pipe": 2, "selector": [2, 3], "seen": 2, "notic": 2, "my": [2, 3], "sleep": 2, "trigger": 2, "context": 2, "switch": 2, "spent": 2, "process_tim": 2, "equal": 2, "slice": 2, "count": 2, "150m": 2, "fix": [2, 4], "amount": 2, "easi": 2, "thank": 2, "wait": 2, "unfortun": 2, "too": [2, 4], "slow": [2, 3], "wish": 2, "itself": 2, "cleanup": 2, "issu": 2, "caus": 2, "crash": 2, "why": 2, "result": 2, "mention": 2, "peopl": 2, "look": [2, 3], "experiment": 2, "procedur": [2, 3], "stuff": 2, "program": 2, "paradigm": 2, "actual": 2, "subclass": 2, "plai": [2, 3, 4], "nice": [2, 3], "pain": 2, "free": [2, 3], "click": [2, 3], "colab": [2, 3], "new": [2, 3], "One": 2, "contain": [2, 3, 4], "preload": 2, "common": 2, "importantli": 2, "rewritten": 2, "simplic": 2, "extens": 2, "flashi": 2, "significantli": 2, "rough": 2, "edg": 2, "longer": 2, "convert": [2, 3], "intern": 2, "wysiwyg": 2, "previous": [2, 3], "creation": 2, "back": [2, 3], "benefit": 2, "describ": 2, "blog": 2, "nle": [2, 3], "nmmo": [2, 3], "def": [2, 3], "nmmo_creat": [2, 3], "nethack_cr": [2, 3], "gympufferenv": 2, "expect": 2, "abov": [2, 3], "prefer": 2, "directli": 2, "compar": 2, "vec": [2, 3], "Or": [2, 3], "async": [2, 3], "_": 2, "notori": [2, 3], "hard": 2, "sever": [2, 3], "popular": [2, 3], "onto": [2, 4], "imag": 2, "build": [2, 3], "coffe": 2, "break": [2, 3], "vanilla": [2, 3], "cleanrl_polici": [2, 3], "expens": 2, "runtim": 2, "could": 2, "disabl": 2, "o": 2, "inconveni": 2, "easili": 2, "forgotten": 2, "onc": 2, "startup": 2, "neglig": 2, "thu": 2, "far": [2, 3, 4], "bug": 2, "version": [2, 3], "would": [2, 3], "caught": 2, "previou": 2, "sane": 2, "setup": [2, 3], "home": 2, "page": [2, 3], "updat": [2, 3], "written": 2, "bottleneck": 2, "studi": 2, "deriv": 2, "correctli": 2, "train": [2, 3, 4], "pad": 2, "longstand": 2, "challeng": 2, "join": [2, 3], "discord": [2, 3], "tell": 2, "point": 2, "goal": 2, "game": [2, 3], "preliminari": 2, "re": [2, 3], "excit": 2, "announc": 2, "dozen": 2, "better": 2, "streamlin": [2, 3], "understand": 2, "need": [2, 3], "determinist": 2, "fulli": [2, 3], "short": [2, 3], "horizon": [2, 3, 4], "contrast": 2, "nondeterminist": 2, "partial": 2, "larg": [2, 3], "popul": [2, 3], "hierarch": 2, "design": [2, 3], "mind": 2, "bia": 2, "toward": 2, "million": 2, "tackl": 2, "lead": 2, "focu": 2, "exclus": 2, "initi": 2, "ran": 2, "file": [2, 3], "proxim": 2, "ppo": [2, 3], "replac": 2, "eas": 2, "log": 2, "latest": 2, "doubl": 2, "asynchron": [2, 3], "samplefactori": 2, "paper": 2, "ensur": 2, "accuraci": 2, "maintain": [2, 3], "wandb": 2, "correct": 2, "kei": 2, "idea": 2, "appear": 2, "therebi": 2, "perspect": 2, "here": [2, 3], "size": 2, "conform": 2, "lose": 2, "inform": 2, "both": 2, "constant": [2, 3], "sort": 2, "order": 2, "subtleti": 2, "termin": [2, 3], "signal": [2, 4], "creat": 2, "straightforward": 2, "name": 2, "env_cl": 2, "env_nam": 2, "accept": 2, "certain": 2, "hook": [2, 3], "well": [2, 3], "abil": 2, "suppress": 2, "avoid": 2, "excess": 2, "split": [2, 3], "few": [2, 4], "prone": 2, "difficult": [2, 3], "finicki": [2, 3], "costli": 2, "vecenv": 2, "rayvecenv": 2, "num_cor": 2, "adher": [2, 3], "perfectli": 2, "quickli": 2, "becom": 2, "cumbersom": 2, "outsid": 2, "ideal": 2, "beyond": [2, 3], "eat": 2, "trace": 2, "access": 2, "remot": 2, "individu": 2, "shorter": 2, "convei": 2, "task": 2, "receiv": [2, 3], "suitabl": 2, "field": 2, "cover": 2, "subsequ": 2, "major": 2, "downsid": 2, "particularli": 2, "hundr": 2, "thousand": 2, "price": 2, "paid": 2, "larger": 2, "emploi": 2, "help": 2, "mitig": 2, "ultim": 2, "continu": [2, 3], "repres": 2, "tool": [2, 3], "plan": 2, "futur": [2, 3], "passthrough": 2, "area": 2, "algorithm": [2, 3, 4], "aim": 2, "commonli": 2, "histor": 2, "multiplay": 2, "skill": 2, "rate": 2, "curriculum": 2, "focus": 2, "mechan": 2, "earli": 2, "stage": 2, "howev": 2, "rapid": 2, "progress": 2, "conflict": 2, "old": 2, "slowli": 2, "increas": 2, "coverag": 2, "joseph": [2, 3], "suarez": [2, 3], "ryan": 2, "sullivan": 2, "feedback": 2, "togeth": 3, "commun": 3, "discuss": 3, "twitter": 3, "star": 3, "repo": 3, "feed": 3, "whitepap": 3, "neurip": 3, "2023": 3, "alo": 3, "workshop": 3, "registri": 3, "tricki": 3, "clone": 3, "repositori": 3, "open": 3, "vscode": 3, "dev": 3, "plugin": 3, "docker": 3, "detect": 3, "devcontain": 3, "avail": 3, "packag": 3, "contributor": 3, "david": 3, "bloomin": 3, "store": 3, "nick": 3, "jenkin": 3, "layout": 3, "system": 3, "diagram": 3, "adversari": 3, "andranik": 3, "tigranyan": 3, "anim": 3, "pufferfish": 3, "hire": 3, "him": 3, "upwork": 3, "sara": 3, "earl": 3, "her": 3, "guid": 3, "notebook": 3, "button": 3, "top": 3, "heirarch": 3, "quirk": 3, "incompat": 3, "everi": [3, 4], "flat": 3, "how": [3, 4], "pettingzootruncatedwrapp": 3, "compliant": 3, "loss": 3, "underli": 3, "otherwis": 3, "choos": 3, "varieti": 3, "share": [3, 4], "total": 3, "tweak": 3, "ship": 3, "care": 3, "nn": 3, "numpi": 3, "np": 3, "super": 3, "prod": 3, "128": 3, "modulelist": 3, "n": 3, "value_head": 3, "reshap": 3, "dec": 3, "driver_env": 3, "truncat": 3, "env_id": 3, "mask": 3, "reli": 3, "conveni": 3, "complet": 3, "almost": 3, "ll": 3, "unpack": 3, "flat_observation_structur": 3, "That": 3, "full": 3, "length": [3, 4], "script": 3, "come": 3, "soon": 3, "subject": 3, "ve": 3, "easier": 3, "portion": 3, "suit": [3, 4], "80": 3, "academ": 3, "about": 3, "view": 3, "heavili": 3, "manag": 3, "competit": 3, "try": 3, "purpos": 3, "industri": 3, "veri": 3, "buggi": 3, "situat": 3, "parti": [3, 4], "openai": 3, "built": 3, "box2d": 3, "gameboi": 3, "activ": 3, "butterfli": 3, "arcad": 3, "classic": [3, 4], "benchmark": 3, "minigrid": 3, "2d": 3, "world": 3, "engin": 3, "collect": 3, "builtin": 3, "computation": 3, "effici": 3, "magent": 3, "platform": 3, "combin": 3, "me": 3, "level": 3, "strip": 3, "down": 3, "edit": 3, "crafter": 3, "minecraft": 3, "pixel": 3, "long": [3, 4], "griddli": 3, "micrort": 3, "real": 3, "strategi": 3, "java": 3, "configur": 3, "No": 3, "wip": 3, "heterogen": 3, "softwar": 3, "under": 3, "mit": 3, "pufferai": 3, "branch": 3, "privat": 3, "scratch": 4, "30": 4, "minut": 4, "saniti": 4, "overnight": 4, "heavier": 4, "password_length": 4, "hard_fixed_se": 4, "guess": 4, "binari": 4, "string": 4, "determin": 4, "latch": 4, "within": 4, "solvabl": 4, "digit": 4, "sens": 4, "when": 4, "p": 4, "whether": 4, "nontrivi": 4, "trivial": 4, "probabl": 4, "mem_length": 4, "mem_delai": 4, "repeat": 4, "sequenc": 4, "capac": 4, "credit": 4, "assign": 4, "present": 4, "0s": 4, "respons": 4, "num_act": 4, "reward_scal": 4, "reward_nois": 4, "multiarm": 4, "arm": 4, "pull": 4}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"index": 0, "user": 0, "guid": 0, "api": 0, "ocean": 0, "blog": 0, "emul": [1, 2], "environ": [1, 3, 4], "model": 1, "vector": [1, 2], "cleanrl": [1, 2], "integr": 1, "sb3": 1, "bind": 1, "rllib": 1, "pufferlib": 2, "0": 2, "5": 2, "A": 2, "bigger": 2, "envpool": 2, "grow": 2, "puffer": 2, "The": 2, "simul": 2, "crisi": 2, "solut": 2, "experi": 2, "technic": 2, "detail": 2, "gotcha": 2, "4": 2, "readi": 2, "take": 2, "fish": 2, "puffertank": 2, "polici": 2, "error": 2, "handl": 2, "miscellan": 2, "2": 2, "big": 2, "problem": 2, "statement": 2, "demo": 2, "next": 2, "step": 2, "librari": 3, "current": 3, "limit": 3, "licens": 3, "squar": 4, "password": 4, "explor": 4, "stochast": 4, "memori": 4, "bandit": 4}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 6, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 56}})
\ No newline at end of file
+Search.setIndex({"docnames": ["index", "rst/api", "rst/blog", "rst/landing", "rst/ocean"], "filenames": ["index.rst", "rst/api.rst", "rst/blog.rst", "rst/landing.rst", "rst/ocean.rst"], "titles": ["Index", "Emulation", "\ud83d\udc21\ud83c\udf0a An Ocean of Environments for Learning Pufferfish", "Libraries", "Squared"], "terms": {"librari": [0, 2], "environ": 0, "current": [0, 1, 2, 4], "limit": [0, 2], "licens": 0, "emul": [0, 3], "model": [0, 2, 3], "vector": [0, 3, 4], "cleanrl": [0, 3], "integr": [0, 2, 3], "sb3": [0, 3], "bind": [0, 2], "rllib": [0, 2, 3], "squar": [0, 1, 2], "password": [0, 2], "explor": [0, 2], "stochast": [0, 2], "memori": [0, 2], "bandit": [0, 2], "an": [0, 1, 3], "learn": [0, 1, 3, 4], "pufferfish": [0, 3], "pufferlib": [0, 1, 3, 4], "0": [0, 3, 4], "5": [0, 4], "A": [0, 1, 4], "bigger": 0, "envpool": [0, 3], "grow": 0, "puffer": [0, 3], "The": [0, 1, 3, 4], "simul": [0, 1, 3, 4], "crisi": 0, "solut": 0, "experi": [0, 3], "technic": 0, "detail": [0, 3], "gotcha": 0, "4": [0, 1, 3, 4], "readi": 0, "take": [0, 1, 3], "fish": 0, "puffertank": [0, 3], "polici": [0, 1, 3, 4], "error": 0, "handl": [0, 1, 3], "miscellan": 0, "2": [0, 3, 4], "big": 0, "problem": [0, 4], "statement": 0, "demo": [0, 1, 3], "next": [0, 4], "step": [0, 1, 3, 4], "our": [1, 2, 3], "public": [1, 2, 3], "api": [1, 2, 3], "advanc": 1, "user": 1, "can": [1, 2, 3, 4], "check": [1, 2, 4], "sourc": [1, 2, 3], "addit": [1, 2, 3], "util": [1, 2, 3], "note": [1, 3], "we": [1, 2, 3], "tend": 1, "move": [1, 2, 4], "around": [1, 2, 3], "more": [1, 2], "often": [1, 4], "contribut": [1, 2], "welcom": 1, "wrap": [1, 2, 3], "your": [1, 2, 3, 4], "broad": 1, "compat": [1, 2], "support": [1, 2, 3], "pass": [1, 2, 3], "creator": [1, 3], "function": [1, 2, 3], "class": [1, 2, 3, 4], "env": [1, 2, 3], "object": [1, 2, 3], "return": [1, 2, 3], "pufferenv": [1, 2], "same": [1, 2, 3, 4], "gym": [1, 2, 3], "pettingzoo": [1, 2, 3], "gymnasiumpufferenv": [1, 3], "none": [1, 4], "env_creat": [1, 2, 3], "env_arg": 1, "env_kwarg": 1, "postprocessor_cl": 1, "basicpostprocessor": 1, "properti": 1, "observation_spac": [1, 4], "flatten": [1, 2, 3], "singl": [1, 2, 3], "tensor": [1, 2, 3], "observ": [1, 2, 3, 4], "space": [1, 2, 3, 4], "action_spac": [1, 4], "multi": [1, 3], "discret": [1, 2, 4], "action": [1, 2, 3, 4], "seed": [1, 4], "reset": [1, 2, 3, 4], "execut": [1, 3], "reward": [1, 2, 3, 4], "done": [1, 2], "info": [1, 2, 3], "render": [1, 2, 4], "close": [1, 3, 4], "unpack_batched_ob": [1, 3], "batched_ob": 1, "pettingzoopufferenv": [1, 2, 3], "postprocessor": 1, "postprocessor_kwarg": 1, "team": [1, 2, 3], "agent": [1, 2, 3, 4], "single_observation_spac": [1, 3], "single_action_spac": [1, 3], "all": [1, 2, 3, 4], "includ": [1, 2, 3], "expos": 1, "make_env": [1, 3], "one": [1, 2, 3, 4], "you": [1, 2, 3, 4], "want": [1, 2, 3], "most": [1, 2, 3], "time": [1, 2, 3, 4], "other": [1, 2, 3], "us": [1, 2, 3, 4], "e": 1, "g": 1, "interfac": [1, 2, 3], "them": [1, 2, 3], "so": [1, 2, 3, 4], "static": [1, 4], "refer": 1, "addition": [1, 2, 3], "baselin": [1, 2, 3], "have": [1, 2, 3, 4], "custom": [1, 2, 3], "default": [1, 2, 3], "simpli": [1, 2], "befor": [1, 2, 3, 4], "appli": [1, 2], "linear": [1, 2, 3], "layer": [1, 2, 3], "atari": [1, 2, 3], "procgen": [1, 3], "neural": [1, 2, 3], "mmo": [1, 2, 3], "nethack": [1, 3], "minihack": [1, 3], "pokemon": [1, 3], "red": [1, 3], "reason": [1, 2, 3], "exampl": [1, 2, 3], "below": [1, 2, 3], "everyth": [1, 3], "through": [1, 3], "__init__": [1, 2, 3], "call": [1, 2], "method": [1, 2], "ocean": [1, 3, 4], "distance_to_target": [1, 4], "1": [1, 2, 3, 4], "num_target": [1, 4], "start": [1, 2, 3, 4], "center": [1, 2, 4], "grid": [1, 3, 4], "target": [1, 2, 3, 4], "ar": [1, 2, 3, 4], "place": [1, 2, 4], "perimet": [1, 4], "minu": [1, 4], "l": [1, 4], "inf": [1, 4], "distanc": [1, 4], "closest": [1, 4], "thi": [1, 2, 3, 4], "mean": [1, 2, 4], "vari": [1, 4], "from": [1, 2, 3, 4], "given": [1, 4], "alreadi": [1, 3, 4], "been": [1, 2, 4], "hit": [1, 2, 4], "box": [1, 2, 3, 4], "grid_siz": [1, 4], "map": [1, 2, 4], "8": [1, 4], "which": [1, 2, 3, 4], "direct": [1, 2, 4], "paramet": [1, 2, 4], "number": [1, 2, 3, 4], "randomli": [1, 4], "gener": [1, 2, 3, 4], "acttyp": [1, 4], "obstyp": [1, 4], "torch": [1, 3], "alia": 1, "option": [1, 2, 3], "These": [1, 2, 3], "requir": [1, 2, 3, 4], "arg": 1, "ani": [1, 2, 3], "kwarg": 1, "pure": 1, "pytorch": [1, 2, 3], "base": [1, 2, 3], "spec": [1, 2], "allow": [1, 2, 3], "repackag": 1, "rl": [1, 2, 3], "framework": [1, 2, 3], "encode_observ": 1, "decode_act": 1, "s": [1, 2, 3, 4], "equival": [1, 2], "forward": [1, 2, 3], "structur": [1, 2, 3], "provid": [1, 2, 3], "flexibl": [1, 3], "lstm": 1, "between": [1, 2, 3, 4], "encod": [1, 3], "decod": [1, 3], "To": [1, 2, 3], "port": [1, 3], "put": 1, "recurr": [1, 2, 3], "cell": 1, "head": 1, "after": [1, 2, 4], "delet": 1, "its": 1, "specif": [1, 2, 3], "wrapper": [1, 2, 3], "sinc": [1, 2, 3], "each": [1, 2, 3], "treat": 1, "tempor": 1, "data": [1, 2], "bit": [1, 2, 3], "differ": [1, 2, 3], "approach": [1, 2, 3], "let": [1, 2], "write": [1, 3], "network": [1, 2, 3], "multipl": [1, 2, 3], "specifi": [1, 2, 3], "valu": [1, 3], "critic": 1, "batch": [1, 2], "element": [1, 2], "It": [1, 2, 3, 4], "output": [1, 2, 4], "abstract": 1, "flat_observ": 1, "hidden": [1, 3], "state": [1, 2], "unflatten": [1, 2], "origin": [1, 2, 3], "form": 1, "self": [1, 2, 3], "structured_observation_spac": 1, "env_output": [1, 3], "shape": [1, 2, 3], "obs_siz": 1, "hidden_s": 1, "lookup": 1, "embed": 1, "type": 1, "flat_hidden": 1, "multidiscret": [1, 2], "action_s": 1, "concaten": 1, "logit": 1, "dimens": 1, "should": [1, 2, 4], "sum": [1, 2], "nvec": [1, 3], "recurrentwrapp": 1, "x": 1, "meant": 1, "debug": [1, 2, 3], "run": [1, 2, 3, 4], "unlik": 1, "anyth": [1, 3], "relu": 1, "list": [1, 2, 3], "concat": 1, "true": [1, 2, 3], "convolut": 1, "stack": [1, 2], "three": 1, "follow": [1, 3, 4], "framestack": 1, "mandatori": 1, "keyword": 1, "argument": [1, 2, 3], "suggest": [1, 2, 3], "frame": 1, "without": [1, 2, 3, 4], "distribut": [1, 2, 3], "backend": [1, 2, 3], "serial": [1, 2, 3], "callabl": 1, "dict": 1, "num_env": [1, 3], "int": 1, "envs_per_work": [1, 2, 3], "envs_per_batch": 1, "env_pool": [1, 3], "bool": 1, "fals": [1, 2, 3], "main": [1, 2], "process": [1, 2], "modul": [1, 2, 3], "flat_observation_spac": [1, 3], "ob": [1, 2, 3], "send": [1, 3], "recv": [1, 2, 3], "async_reset": [1, 2, 3], "profil": [1, 2], "get": [1, 2, 3, 4], "multiprocess": [1, 2, 3], "parallel": [1, 2], "applic": 1, "rai": [1, 2, 3], "cluster": 1, "also": [1, 2, 3, 4], "faster": [1, 2, 3], "than": [1, 2, 3], "machin": [1, 2, 3], "non": [1, 3], "get_valu": 1, "get_action_and_valu": [1, 3], "add": [1, 2, 3], "pretti": [1, 2], "simpl": [1, 2, 3], "see": [1, 2, 3], "recurrentpolici": 1, "minim": [1, 2], "cnn": 1, "py": 1, "shelv": 1, "until": 1, "stabl": [1, 2], "browser": [2, 3], "doe": [2, 3, 4], "video": [2, 3], "small": [2, 4], "suit": [2, 3, 4], "train": [2, 3, 4], "scratch": [2, 4], "30": [2, 4], "second": [2, 4], "termin": [2, 3], "saniti": [2, 4], "common": 2, "implement": [2, 3], "bug": 2, "quick": 2, "verif": 2, "test": [2, 3, 4], "whenev": 2, "make": [2, 3, 4], "code": [2, 3, 4], "chang": [2, 3], "shown": 2, "binari": [2, 4], "token": 2, "must": 2, "recit": 2, "back": [2, 3], "paus": 2, "do": [2, 3, 4], "sequenc": [2, 4], "too": [2, 4], "long": [2, 3, 4], "credit": [2, 4], "assign": [2, 4], "particular": 2, "nondeterminist": 2, "architectur": [2, 3], "solv": 2, "task": 2, "guess": [2, 4], "tune": 2, "entropi": 2, "coeffici": 2, "higher": 2, "would": [2, 3], "actual": 2, "point": 2, "multiarm": [2, 4], "histor": 2, "import": [2, 3], "set": [2, 3], "spawn": 2, "edg": 2, "There": [2, 3], "separ": 2, "jointli": 2, "help": 2, "prod": [2, 3], "deeper": 2, "issu": 2, "project": [2, 3], "heavili": [2, 3], "inspir": 2, "bsuit": 2, "deepmind": 2, "similar": 2, "benchmarki": 2, "goal": 2, "wa": [2, 3], "heavi": 2, "my": [2, 3], "like": [2, 3], "didn": 2, "t": [2, 3], "fit": 2, "nich": 2, "portabl": 2, "i": 2, "had": 2, "few": [2, 4], "design": [2, 3], "appar": 2, "standard": [2, 3, 4], "rnn": 2, "copi": 2, "surpris": 2, "weren": 2, "But": 2, "context": 2, "becaus": 2, "still": [2, 3], "don": [2, 3], "think": 2, "wai": [2, 3], "fulli": [2, 3], "isol": 2, "onli": 2, "outsid": 2, "try": [2, 3], "increas": 2, "length": [2, 3, 4], "delai": [2, 4], "quickli": 2, "find": 2, "harder": 2, "just": [2, 3], "work": [2, 3], "about": [2, 3], "rate": 2, "expect": 2, "onc": 2, "happen": [2, 4], "right": 2, "instanc": [2, 4], "much": [2, 3], "possibl": 2, "priorit": 2, "replai": 2, "trivial": [2, 4], "took": 2, "longest": 2, "initi": 2, "look": [2, 3], "where": 2, "optim": [2, 3, 4], "nontrivi": [2, 4], "even": 2, "could": 2, "figur": 2, "out": [2, 3], "how": [2, 3, 4], "twitter": [2, 3], "seem": 2, "imposs": 2, "thei": [2, 3, 4], "re": [2, 3], "probabl": [2, 4], "though": 2, "might": 2, "abl": 2, "alter": 2, "setup": [2, 3], "condit": 2, "thing": 2, "someth": 2, "better": 2, "For": [2, 3], "now": 2, "consist": 2, "wrote": 2, "earlier": 2, "kind": 2, "left": 2, "releas": [2, 3], "good": [2, 3], "idea": 2, "least": 2, "some": [2, 3], "version": [2, 3], "easili": 2, "access": 2, "over": [2, 3], "summer": 2, "m": 2, "rather": 2, "fond": 2, "fairli": 2, "scalabl": 2, "first": [2, 3, 4], "teleport": 2, "equal": 2, "toward": 2, "determinist": 2, "select": [2, 4], "interest": 2, "turn": 2, "In": 2, "event": 2, "combin": [2, 3], "simpler": 2, "tunabl": 2, "interpret": 2, "me": [2, 3], "know": 2, "late": 2, "ve": [2, 3], "land": 2, "either": 2, "veri": [2, 3], "complex": [2, 3], "being": 2, "research": [2, 3], "mani": [2, 3], "middl": 2, "slow": [2, 3], "what": [2, 3], "reinforc": [2, 3], "cpu": 2, "wouldn": 2, "pack": 2, "With": 2, "python": [2, 3], "tl": 2, "dr": 2, "20": 2, "perform": 2, "improv": [2, 3], "across": 2, "workload": 2, "up": [2, 3], "2x": 2, "nativ": 2, "multiag": [2, 3], "If": [2, 3], "enhanc": 2, "pip": [2, 3], "instal": [2, 3], "u": 2, "d": 2, "behind": 2, "curtain": 2, "read": 2, "sai": 2, "1000": [2, 4], "core": 2, "5000": 2, "6": 2, "decid": 2, "upon": [2, 4], "brilliant": 2, "develop": [2, 3], "truli": 2, "fantast": 2, "1500": 2, "scale": [2, 3, 4], "1800": 2, "per": [2, 3], "give": [2, 3], "speed": 2, "did": 2, "modern": 2, "overhead": 2, "mostli": 2, "launch": 2, "synchron": [2, 3], "roughli": 2, "ms": 2, "At": 2, "100": 2, "ignor": 2, "10": 2, "000": 2, "factor": 2, "varianc": 2, "defin": 2, "ratio": 2, "mu": 2, "std": 2, "root": 2, "24": 2, "300": 2, "especi": 2, "intel": 2, "desktop": [2, 3], "seri": 2, "processor": 2, "featur": [2, 3], "slower": 2, "latenc": 2, "taken": 2, "transfer": 2, "gpu": [2, 3], "part": 2, "multiprocesss": 2, "naiv": 2, "leav": 2, "idl": 2, "dure": 2, "infer": 2, "As": 2, "rule": 2, "thumb": 2, "alwai": [2, 4], "ones": [2, 3], "variabl": [2, 3], "depend": [2, 3], "On": 2, "hand": 2, "ha": [2, 3, 4], "effect": 2, "100x": 2, "fewer": 2, "lower": 2, "2000": 2, "sp": 2, "reduc": 2, "impact": 2, "sampl": [2, 3], "typic": 2, "enabl": [2, 3], "interact": 2, "buffer": 2, "while": [2, 3], "techniqu": 2, "introduc": 2, "http": 2, "github": [2, 3], "com": 2, "alex": 2, "petrenko": 2, "factori": 2, "interleav": 2, "two": [2, 3], "trick": 2, "supersed": 2, "final": 2, "pool": [2, 3], "finish": 2, "64": 2, "comput": [2, 3], "go": 2, "fastest": 2, "40": 2, "sail": 2, "sg": 2, "massiv": [2, 3], "c": 2, "arbitrari": 2, "evalu": 2, "am": 2, "13900k": 2, "max": 2, "maingear": 2, "debian": 2, "12": 2, "9": 2, "1e": 2, "96": 2, "192": 2, "gymnasium": [2, 3], "post": 2, "sweep": 2, "consid": 2, "yield": 2, "anoth": 2, "rel": [2, 3], "group": 2, "bar": 2, "gymasium": 2, "match": 2, "case": 2, "best": 2, "high": [2, 3], "deviat": [2, 4], "mai": [2, 3], "again": [2, 3], "save": 2, "dip": 2, "50": 2, "absolut": 2, "sure": 2, "unavoid": 2, "reimplement": 2, "entir": 2, "10734": 2, "36": 2, "delay_mean": 2, "delay_std": 2, "num_work": 2, "batch_siz": 2, "sync": 2, "11640": 2, "42": [2, 4], "32715": 2, "65": 2, "27635": 2, "31": 2, "22681": 2, "48": 2, "26183": 2, "73": 2, "30120": 2, "75": [2, 4], "cap": 2, "worker": [2, 3], "room": 2, "matter": 2, "extrem": [2, 3], "concis": 2, "800": 2, "line": [2, 3], "ad": 2, "lot": 2, "queue": 2, "fast": 2, "poll": 2, "instead": [2, 4], "pipe": 2, "selector": [2, 3], "seen": 2, "notic": 2, "sleep": 2, "trigger": 2, "switch": 2, "spent": 2, "process_tim": 2, "slice": 2, "count": 2, "150m": 2, "fix": [2, 4], "amount": 2, "easi": 2, "thank": 2, "wait": 2, "unfortun": 2, "wish": 2, "itself": 2, "cleanup": 2, "caus": 2, "crash": 2, "why": 2, "result": 2, "mention": 2, "peopl": 2, "experiment": 2, "procedur": [2, 3], "stuff": 2, "program": 2, "paradigm": 2, "subclass": 2, "plai": [2, 3, 4], "nice": [2, 3], "pain": 2, "free": [2, 3], "click": [2, 3], "colab": [2, 3], "new": [2, 3], "One": 2, "contain": [2, 3, 4], "preload": 2, "importantli": 2, "rewritten": 2, "simplic": 2, "extens": 2, "flashi": 2, "significantli": 2, "rough": 2, "longer": 2, "convert": [2, 3], "intern": 2, "wysiwyg": 2, "previous": [2, 3], "creation": 2, "benefit": 2, "describ": 2, "blog": 2, "nle": [2, 3], "nmmo": [2, 3], "def": [2, 3], "nmmo_creat": [2, 3], "nethack_cr": [2, 3], "gympufferenv": 2, "abov": [2, 3], "prefer": 2, "directli": 2, "compar": 2, "vec": [2, 3], "Or": [2, 3], "async": [2, 3], "_": 2, "notori": [2, 3], "hard": 2, "sever": [2, 3], "popular": [2, 3], "onto": [2, 4], "imag": 2, "build": [2, 3], "coffe": 2, "break": [2, 3], "vanilla": [2, 3], "cleanrl_polici": [2, 3], "expens": 2, "runtim": 2, "disabl": 2, "o": 2, "inconveni": 2, "forgotten": 2, "startup": 2, "neglig": 2, "thu": 2, "far": [2, 3, 4], "caught": 2, "previou": 2, "sane": 2, "home": 2, "page": [2, 3], "updat": [2, 3], "written": 2, "bottleneck": 2, "studi": 2, "deriv": 2, "correctli": 2, "pad": 2, "longstand": 2, "challeng": 2, "join": [2, 3], "discord": [2, 3], "tell": 2, "game": [2, 3], "preliminari": 2, "excit": 2, "announc": 2, "dozen": 2, "streamlin": [2, 3], "understand": 2, "need": [2, 3], "short": [2, 3], "horizon": [2, 3, 4], "contrast": 2, "partial": 2, "larg": [2, 3], "popul": [2, 3], "hierarch": 2, "mind": 2, "bia": 2, "million": 2, "tackl": 2, "lead": 2, "focu": 2, "exclus": 2, "ran": 2, "file": [2, 3], "proxim": 2, "ppo": [2, 3], "replac": 2, "eas": 2, "log": 2, "latest": 2, "doubl": 2, "asynchron": [2, 3], "samplefactori": 2, "paper": 2, "ensur": 2, "accuraci": 2, "maintain": [2, 3], "wandb": 2, "correct": 2, "kei": 2, "appear": 2, "therebi": 2, "perspect": 2, "here": [2, 3], "size": 2, "conform": 2, "lose": 2, "inform": 2, "both": 2, "constant": [2, 3], "sort": 2, "order": 2, "subtleti": 2, "signal": [2, 4], "creat": 2, "straightforward": 2, "name": 2, "env_cl": 2, "env_nam": 2, "accept": 2, "certain": 2, "hook": [2, 3], "well": [2, 3], "abil": 2, "suppress": 2, "avoid": 2, "excess": 2, "split": [2, 3], "prone": 2, "difficult": [2, 3], "finicki": [2, 3], "costli": 2, "vecenv": 2, "rayvecenv": 2, "num_cor": 2, "adher": [2, 3], "perfectli": 2, "becom": 2, "cumbersom": 2, "ideal": 2, "beyond": [2, 3], "eat": 2, "trace": 2, "remot": 2, "individu": 2, "shorter": 2, "convei": 2, "receiv": [2, 3], "suitabl": 2, "field": 2, "cover": 2, "subsequ": 2, "major": 2, "downsid": 2, "particularli": 2, "hundr": 2, "thousand": 2, "price": 2, "paid": 2, "larger": 2, "emploi": 2, "mitig": 2, "ultim": 2, "continu": [2, 3], "repres": 2, "tool": [2, 3], "plan": 2, "futur": [2, 3], "passthrough": 2, "area": 2, "algorithm": [2, 3, 4], "aim": 2, "commonli": 2, "multiplay": 2, "skill": 2, "curriculum": 2, "focus": 2, "mechan": 2, "earli": 2, "stage": 2, "howev": 2, "rapid": 2, "progress": 2, "conflict": 2, "old": 2, "slowli": 2, "coverag": 2, "joseph": [2, 3], "suarez": [2, 3], "ryan": 2, "sullivan": 2, "feedback": 2, "togeth": 3, "commun": 3, "discuss": 3, "star": 3, "repo": 3, "feed": 3, "whitepap": 3, "neurip": 3, "2023": 3, "alo": 3, "workshop": 3, "registri": 3, "tricki": 3, "clone": 3, "repositori": 3, "open": 3, "vscode": 3, "dev": 3, "plugin": 3, "docker": 3, "detect": 3, "devcontain": 3, "avail": 3, "packag": 3, "contributor": 3, "david": 3, "bloomin": 3, "store": 3, "nick": 3, "jenkin": 3, "layout": 3, "system": 3, "diagram": 3, "adversari": 3, "andranik": 3, "tigranyan": 3, "anim": 3, "hire": 3, "him": 3, "upwork": 3, "sara": 3, "earl": 3, "her": 3, "guid": 3, "notebook": 3, "button": 3, "top": 3, "heirarch": 3, "quirk": 3, "incompat": 3, "everi": [3, 4], "flat": 3, "pettingzootruncatedwrapp": 3, "compliant": 3, "loss": 3, "underli": 3, "otherwis": 3, "choos": 3, "varieti": 3, "share": [3, 4], "total": 3, "tweak": 3, "ship": 3, "care": 3, "nn": 3, "numpi": 3, "np": 3, "super": 3, "128": 3, "modulelist": 3, "n": 3, "value_head": 3, "reshap": 3, "dec": 3, "driver_env": 3, "truncat": 3, "env_id": 3, "mask": 3, "reli": 3, "conveni": 3, "complet": 3, "almost": 3, "ll": 3, "unpack": 3, "flat_observation_structur": 3, "That": 3, "full": 3, "script": 3, "come": 3, "soon": 3, "subject": 3, "easier": 3, "portion": 3, "80": 3, "academ": 3, "view": 3, "manag": 3, "competit": 3, "purpos": 3, "industri": 3, "buggi": 3, "situat": 3, "parti": [3, 4], "openai": 3, "built": 3, "box2d": 3, "gameboi": 3, "activ": 3, "butterfli": 3, "arcad": 3, "classic": [3, 4], "benchmark": 3, "minigrid": 3, "2d": 3, "world": 3, "engin": 3, "collect": 3, "builtin": 3, "computation": 3, "effici": 3, "magent": 3, "platform": 3, "level": 3, "strip": 3, "down": 3, "edit": 3, "crafter": 3, "minecraft": 3, "pixel": 3, "griddli": 3, "micrort": 3, "real": 3, "strategi": 3, "java": 3, "configur": 3, "No": 3, "wip": 3, "heterogen": 3, "softwar": 3, "under": 3, "mit": 3, "pufferai": 3, "branch": 3, "privat": 3, "minut": 4, "overnight": 4, "heavier": 4, "password_length": 4, "hard_fixed_se": 4, "string": 4, "determin": 4, "latch": 4, "within": 4, "solvabl": 4, "digit": 4, "sens": 4, "when": 4, "p": 4, "whether": 4, "mem_length": 4, "mem_delai": 4, "repeat": 4, "capac": 4, "present": 4, "0s": 4, "respons": 4, "num_act": 4, "reward_scal": 4, "reward_nois": 4, "arm": 4, "pull": 4}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"index": 0, "user": 0, "guid": 0, "api": 0, "ocean": [0, 2], "blog": 0, "emul": [1, 2], "environ": [1, 2, 3, 4], "model": 1, "vector": [1, 2], "cleanrl": [1, 2], "integr": 1, "sb3": 1, "bind": 1, "rllib": 1, "an": 2, "learn": 2, "pufferfish": 2, "pufferlib": 2, "0": 2, "5": 2, "A": 2, "bigger": 2, "envpool": 2, "grow": 2, "puffer": 2, "The": 2, "simul": 2, "crisi": 2, "solut": 2, "experi": 2, "technic": 2, "detail": 2, "gotcha": 2, "4": 2, "readi": 2, "take": 2, "fish": 2, "puffertank": 2, "polici": 2, "error": 2, "handl": 2, "miscellan": 2, "2": 2, "big": 2, "problem": 2, "statement": 2, "demo": 2, "next": 2, "step": 2, "librari": 3, "current": 3, "limit": 3, "licens": 3, "squar": 4, "password": 4, "explor": 4, "stochast": 4, "memori": 4, "bandit": 4}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 6, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 56}})
\ No newline at end of file
diff --git a/docs/source/resource/ocean.png b/docs/source/resource/ocean.png
new file mode 100755
index 0000000..2eff103
Binary files /dev/null and b/docs/source/resource/ocean.png differ
diff --git a/docs/source/rst/blog.rst b/docs/source/rst/blog.rst
index cfea71b..08f602d 100644
--- a/docs/source/rst/blog.rst
+++ b/docs/source/rst/blog.rst
@@ -11,6 +11,39 @@
      </video>
    </center>
 
+🐡🌊 An Ocean of Environments for Learning Pufferfish
+#####################################################
+
+Ocean is a small suite of environments that train from scratch in 30 seconds and render in a terminal. Each environment is a sanity check for a common implementation bug. Use Ocean as a quick verification test whenever you make small code changes.
+
+.. image:: ../resource/ocean.png
+   :width: 100%
+   :align: center
+
+**Memory:** The agent is shown one binary token at a time and must recite them back after a pause. Do not make the sequence too long or you start testing credit assignment.
+
+**Stochasticity:** The agent is rewarded for learning a particular nondeterministic action distribution. Do not use an architecture with memory or the agent can solve the task without stochasticity.
+
+**Exploration:** The agent is rewarded for guessing a specific binary sequence. Do not tune your entropy coefficients higher than you would use in your actual environments, since that is the point of the test.
+
+**Bandit:** The agent is rewarded for solving a multiarmed bandit problem. This environment is included for historical importance. Any reasonable implementation should solve the default setting.
+
+**Squared:** The agent is rewarded for moving to targets that spawn around the edges of a square. There are settings to test memory, exploration, and stochasticity separately or jointly to help you prod at deeper issues with your implementation.
+
+This project is heavily inspired by BSuite, a DeepMind project with similar if more benchmarky goals. BSuite was a bit too heavy for my liking and didn’t fit the niche of a quick and portable verification suite.
+
+I had a few issues designing these. The memory task is apparently a standard RNN copying task (I would be surprised if it weren’t). But it’s a bit different in an RL context because you still have to learn credit assignment. I don’t think there is a way to fully isolate learning only memory outside of a simple 1-step problem. Try increasing the memory sequence length or delay and you will quickly find that the problem gets harder to learn.
+
+The exploration environment is the only one that just worked. You can increase the password length and the problem gets harder to learn at about the rate you would expect. It’s just a guess and check, so once you happen to get the password right once, the goal is to learn from that single instance as much as possible. Any prioritized replay would trivialize the problem.
+
+The stochastic environment took the longest. Initially, I was looking for one where the optimal policy was still stochastic and nontrivial even if the agent had memory. I could not figure out how to make one of these, and Twitter seems to think it’s impossible. They’re probably right, though you might be able to alter the setup conditions a bit, still test for the same thing, and have something that works better. For now, this is a quick and consistent test.
+
+I wrote the bandit environment earlier in the project, and it seems kind of useful, so I left it in the release. Probably a good idea to have at least some version of a problem this historically important easily accessible in PufferLib.
+
+I wrote Squared over the summer. I’m rather fond of it as a test environment, since it is fairly scalable. You spawn at the center of a square and targets spawn around the outside. You get a reward the first time you hit each target and are teleported to the center whenever you hit a target. This means that the optimal policy is stochastic: you place equal probability on moving towards each target and then deterministically move towards the target you have selected. It’s interesting because the optimal policy is stochastic in some states and deterministic in others. You can also turn the problem into a memory test by using a recurrent network. In any event, it’s similar to the bandit problem in that it combines elements of the simpler tests, but it’s a bit more tunable and interpretable.
+
+Let me know if you have other ideas for useful test environments. Lately, I’ve landed on either very simple or very complex environments as being the most useful for research. Many of the tasks in the middle (looking at you Atari) are too slow to be useful as quick tests and too simple to test interesting ideas.
+
 PufferLib 0.5: A Bigger EnvPool for Growing Puffers
 ###################################################
 
diff --git a/docs/source/rst/ocean.rst b/docs/source/rst/ocean.rst
index 41746b0..3277f1c 100644
--- a/docs/source/rst/ocean.rst
+++ b/docs/source/rst/ocean.rst
@@ -4,6 +4,8 @@
 
 🌊 Ocean is PufferLib's suite of first-party environments. They are small and can be trained from scratch in 30 seconds to 2 minutes. Use Ocean as a sanity check for your training code instead of overnighting heavier runs.
 
+.. image:: /resource/ocean.png
+
 Squared
 *******