diff --git a/blog/2024/08/31/ppswor.html b/blog/2024/08/31/ppswor.html index 8f844780e..645987144 100644 --- a/blog/2024/08/31/ppswor.html +++ b/blog/2024/08/31/ppswor.html @@ -319,6 +319,28 @@
I guess the meta-lesson is to watch out for things you’re assuming without noticing you are.
+When I posted about this on Mastodon, Thomas Lumley drew my attention to the sampling
library by Yves Tillé and Alina Matei, which contains multiple algorithms for sampling with unequal probabilities without replacement. I hadn’t even stopped to look because really this was about me learning, but yes, of course there’s an R package for that. So I wrapped their UPbrewer
function in a little function to work with my comparison. Unlike me they had read Brewer’s actual article, and their implementation of course works perfectly:
So the version I hacked together based on a Cross-Validated answer clearly doesn’t do justice to the method proposed by Brewer (or more likely was some other method he proposed separately).
+ +For the record here’s the little convenience function to test the Tillé/Matei implementation:
+ + + +I’ts much faster too!
diff --git a/feed.r.xml b/feed.r.xml index 1ab5c4a30..46ed625c4 100644 --- a/feed.r.xml +++ b/feed.r.xml @@ -167,6 +167,28 @@ <p>I guess the meta-lesson is to watch out for things you’re assuming without noticing you are.</p> +<h2 id="late-addition">Late addition!</h2> + +<p>When I posted about this on Mastodon, Thomas Lumley drew my attention to the <code class="language-plaintext highlighter-rouge">sampling</code> library by Yves Tillé and Alina Matei, which contains multiple algorithms for sampling with unequal probabilities without replacement. I hadn’t even stopped to look because really this was about me learning, but yes, of course there’s an R package for that. So I wrapped their <code class="language-plaintext highlighter-rouge">UPbrewer</code> function in a little function to work with my comparison. Unlike me they had read Brewer’s actual article, and their implementation of course works perfectly:</p> + +<object type="image/svg+xml" data="https://freerangestats.info/img/0276-20-10-brewer-better.svg" width="100%"><img src="https://freerangestats.info/img/0276-20-10-brewer-better.png" width="100%" /></object> + +<p>So the version I hacked together based on a Cross-Validated answer clearly doesn’t do justice to the method proposed by Brewer (or more likely was some other method he proposed separately).</p> + +<p>For the record here’s the little convenience function to test the Tillé/Matei implementation:</p> + +<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#--------------using sampling library------------------</span><span class="w"> +</span><span class="n">library</span><span class="p">(</span><span class="n">sampling</span><span class="p">)</span><span class="w"> +</span><span class="n">sample_brewer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="p">,</span><span class="w"> </span><span class="n">prob</span><span class="p">,</span><span class="w"> </span><span class="n">replace</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="n">keep</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">){</span><span class="w"> + </span><span class="n">pik</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">prob</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">prob</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">size</span><span class="w"> + </span><span class="n">s</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">UPbrewer</span><span class="p">(</span><span class="n">pik</span><span class="p">)</span><span class="w"> + </span><span class="n">the_sample</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">x</span><span class="p">[</span><span class="n">which</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">)]</span><span class="w"> + </span><span class="nf">return</span><span class="p">(</span><span class="n">the_sample</span><span class="p">)</span><span class="w"> + </span><span class="p">}</span><span class="w"> + +</span><span class="n">compare_ppswor</span><span class="p">(</span><span class="n">FUN</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sample_brewer</span><span class="p">,</span><span class="w"> </span><span class="n">reps</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10000</span><span class="p">)</span></code></pre></figure> + +<p>I’ts <em>much</em> faster too!</p>