Skip to content

Commit

Permalink
late addition
Browse files Browse the repository at this point in the history
  • Loading branch information
ellisp committed Aug 31, 2024
1 parent 420ec5b commit 74b3b92
Show file tree
Hide file tree
Showing 4 changed files with 146 additions and 0 deletions.
22 changes: 22 additions & 0 deletions blog/2024/08/31/ppswor.html
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,28 @@ <h2 id="brewers-1975-algorithm">Brewer’s 1975 algorithm</h2>

<p>I guess the meta-lesson is to watch out for things you’re assuming without noticing you are.</p>

<h2 id="late-addition">Late addition!</h2>

<p>When I posted about this on Mastodon, Thomas Lumley drew my attention to the <code class="language-plaintext highlighter-rouge">sampling</code> library by Yves Tillé and Alina Matei, which contains multiple algorithms for sampling with unequal probabilities without replacement. I hadn’t even stopped to look because really this was about me learning, but yes, of course there’s an R package for that. So I wrapped their <code class="language-plaintext highlighter-rouge">UPbrewer</code> function in a little function to work with my comparison. Unlike me they had read Brewer’s actual article, and their implementation of course works perfectly:</p>

<object type="image/svg+xml" data="/img/0276-20-10-brewer-better.svg" width="100%"><img src="/img/0276-20-10-brewer-better.png" width="100%" /></object>

<p>So the version I hacked together based on a Cross-Validated answer clearly doesn’t do justice to the method proposed by Brewer (or more likely was some other method he proposed separately).</p>

<p>For the record here’s the little convenience function to test the Tillé/Matei implementation:</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#--------------using sampling library------------------</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">sampling</span><span class="p">)</span><span class="w">
</span><span class="n">sample_brewer</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="p">,</span><span class="w"> </span><span class="n">prob</span><span class="p">,</span><span class="w"> </span><span class="n">replace</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="n">keep</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">){</span><span class="w">
</span><span class="n">pik</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">prob</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">prob</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">size</span><span class="w">
</span><span class="n">s</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">UPbrewer</span><span class="p">(</span><span class="n">pik</span><span class="p">)</span><span class="w">
</span><span class="n">the_sample</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">x</span><span class="p">[</span><span class="n">which</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">)]</span><span class="w">
</span><span class="nf">return</span><span class="p">(</span><span class="n">the_sample</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="n">compare_ppswor</span><span class="p">(</span><span class="n">FUN</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sample_brewer</span><span class="p">,</span><span class="w"> </span><span class="n">reps</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10000</span><span class="p">)</span></code></pre></figure>

<p>I’ts <em>much</em> faster too!</p>



Expand Down
22 changes: 22 additions & 0 deletions feed.r.xml
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,28 @@

&lt;p&gt;I guess the meta-lesson is to watch out for things you’re assuming without noticing you are.&lt;/p&gt;

&lt;h2 id=&quot;late-addition&quot;&gt;Late addition!&lt;/h2&gt;

&lt;p&gt;When I posted about this on Mastodon, Thomas Lumley drew my attention to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sampling&lt;/code&gt; library by Yves Tillé and Alina Matei, which contains multiple algorithms for sampling with unequal probabilities without replacement. I hadn’t even stopped to look because really this was about me learning, but yes, of course there’s an R package for that. So I wrapped their &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UPbrewer&lt;/code&gt; function in a little function to work with my comparison. Unlike me they had read Brewer’s actual article, and their implementation of course works perfectly:&lt;/p&gt;

&lt;object type=&quot;image/svg+xml&quot; data=&quot;https://freerangestats.info/img/0276-20-10-brewer-better.svg&quot; width=&quot;100%&quot;&gt;&lt;img src=&quot;https://freerangestats.info/img/0276-20-10-brewer-better.png&quot; width=&quot;100%&quot; /&gt;&lt;/object&gt;

&lt;p&gt;So the version I hacked together based on a Cross-Validated answer clearly doesn’t do justice to the method proposed by Brewer (or more likely was some other method he proposed separately).&lt;/p&gt;

&lt;p&gt;For the record here’s the little convenience function to test the Tillé/Matei implementation:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-r&quot; data-lang=&quot;r&quot;&gt;&lt;span class=&quot;c1&quot;&gt;#--------------using sampling library------------------&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;library&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sampling&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sample_brewer&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prob&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;replace&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;FALSE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;keep&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;FALSE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;){&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pik&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prob&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prob&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;UPbrewer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pik&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;the_sample&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;which&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;==&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;return&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;the_sample&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;

&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;compare_ppswor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FUN&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sample_brewer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reps&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;10000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;I’ts &lt;em&gt;much&lt;/em&gt; faster too!&lt;/p&gt;
</description>
<pubDate>Sat, 31 Aug 2024 00:00:00 +1100</pubDate>
<link>https://freerangestats.info/blog/2024/08/31/ppswor</link>
Expand Down
Binary file added img/0276-20-10-brewer-better.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
102 changes: 102 additions & 0 deletions img/0276-20-10-brewer-better.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 74b3b92

Please sign in to comment.