diff --git a/blog/2024/08/31/ppswor.html b/blog/2024/08/31/ppswor.html index 8f844780e..645987144 100644 --- a/blog/2024/08/31/ppswor.html +++ b/blog/2024/08/31/ppswor.html @@ -319,6 +319,28 @@

Brewer’s 1975 algorithm

I guess the meta-lesson is to watch out for things you’re assuming without noticing you are.

+

Late addition!

+ +

When I posted about this on Mastodon, Thomas Lumley drew my attention to the sampling library by Yves Tillé and Alina Matei, which contains multiple algorithms for sampling with unequal probabilities without replacement. I hadn’t even stopped to look because really this was about me learning, but yes, of course there’s an R package for that. So I wrapped their UPbrewer function in a little function to work with my comparison. Unlike me they had read Brewer’s actual article, and their implementation of course works perfectly:

+ + + +

So the version I hacked together based on a Cross-Validated answer clearly doesn’t do justice to the method proposed by Brewer (or more likely was some other method he proposed separately).

+ +

For the record here’s the little convenience function to test the Tillé/Matei implementation:

+ +
#--------------using sampling library------------------
+library(sampling)
+sample_brewer <- function(x, size, prob, replace = FALSE, keep = FALSE){
+ pik <- prob / sum(prob) * size
+ s <- UPbrewer(pik)
+ the_sample <- x[which(s == 1)]
+ return(the_sample)
+ }
+
+compare_ppswor(FUN = sample_brewer, reps = 10000)
+ +

I’ts much faster too!

diff --git a/feed.r.xml b/feed.r.xml index 1ab5c4a30..46ed625c4 100644 --- a/feed.r.xml +++ b/feed.r.xml @@ -167,6 +167,28 @@ <p>I guess the meta-lesson is to watch out for things you’re assuming without noticing you are.</p> +<h2 id="late-addition">Late addition!</h2> + +<p>When I posted about this on Mastodon, Thomas Lumley drew my attention to the <code class="language-plaintext highlighter-rouge">sampling</code> library by Yves Tillé and Alina Matei, which contains multiple algorithms for sampling with unequal probabilities without replacement. I hadn’t even stopped to look because really this was about me learning, but yes, of course there’s an R package for that. So I wrapped their <code class="language-plaintext highlighter-rouge">UPbrewer</code> function in a little function to work with my comparison. Unlike me they had read Brewer’s actual article, and their implementation of course works perfectly:</p> + +<object type="image/svg+xml" data="https://freerangestats.info/img/0276-20-10-brewer-better.svg" width="100%"><img src="https://freerangestats.info/img/0276-20-10-brewer-better.png" width="100%" /></object> + +<p>So the version I hacked together based on a Cross-Validated answer clearly doesn’t do justice to the method proposed by Brewer (or more likely was some other method he proposed separately).</p> + +<p>For the record here’s the little convenience function to test the Tillé/Matei implementation:</p> + +<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#--------------using sampling library------------------</span><span class="w"> +</span><span class="n">library</span><span class="p">(</span><span class="n">sampling</span><span class="p">)</span><span class="w"> +</span><span class="n">sample_brewer</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="p">,</span><span class="w"> </span><span class="n">prob</span><span class="p">,</span><span class="w"> </span><span class="n">replace</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="n">keep</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">){</span><span class="w"> + </span><span class="n">pik</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">prob</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">prob</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">size</span><span class="w"> + </span><span class="n">s</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">UPbrewer</span><span class="p">(</span><span class="n">pik</span><span class="p">)</span><span class="w"> + </span><span class="n">the_sample</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">x</span><span class="p">[</span><span class="n">which</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">)]</span><span class="w"> + </span><span class="nf">return</span><span class="p">(</span><span class="n">the_sample</span><span class="p">)</span><span class="w"> + </span><span class="p">}</span><span class="w"> + +</span><span class="n">compare_ppswor</span><span class="p">(</span><span class="n">FUN</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sample_brewer</span><span class="p">,</span><span class="w"> </span><span class="n">reps</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10000</span><span class="p">)</span></code></pre></figure> + +<p>I’ts <em>much</em> faster too!</p> Sat, 31 Aug 2024 00:00:00 +1100 https://freerangestats.info/blog/2024/08/31/ppswor diff --git a/img/0276-20-10-brewer-better.png b/img/0276-20-10-brewer-better.png new file mode 100644 index 000000000..9a31f1492 Binary files /dev/null and b/img/0276-20-10-brewer-better.png differ diff --git a/img/0276-20-10-brewer-better.svg b/img/0276-20-10-brewer-better.svg new file mode 100644 index 000000000..415b02c2a --- /dev/null +++ b/img/0276-20-10-brewer-better.svg @@ -0,0 +1,102 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +0.025 +0.050 +0.075 + + + + + + +0.025 +0.050 +0.075 +Original 'probability' or weight +Actual proportion of selections +Population of 20, sample size 10, sampling without replacement. +Using Brewer (1975) as implemented by Tillé/Matei. +Use of `sample()` with unequal probabilities of sampling + +