In the standard form of the problem, the answers can be in any order and so the order must also be chosen fairly; there is a variation in which the answers must be in sorted order. Any algorithm can be turned into one with a "sorted" order guarantee with an O(k log k) sort, or a "random" order guarantee with an O(k) Fisher-Yates shuffle.
Algorithm | Python | C++ | Order guarantee | Data structures | Time |
---|---|---|---|---|---|
quadratic rejection sampling | Python | C++ | Random | none | k^2 |
rejection sampling | Python | C++ | Random | Set | k |
iterative random choosing | Python | C++ | Sorted | none | n |
reservoir sampling | Python | C++ | Random | none | n |
SELECT | Python | C++ | Random | n-sized list | n |
HSEL | Python | C++ | Random | Dictionary | k |
Floyd's F2 | Python | C++ | none | Set | k |
"cardchoose" | Python | C++ | Sorted | none | k log k |
Algorithm | Python | C++ | Order guarantee | Data structures | Time |
---|---|---|---|---|---|
quadratic rejection sampling | Python | C++ | Random | none | k^2 |
For each output, try choosing an integer in [0, n), but check every integer already in the output to see if it's the same, and reject and re-draw if so. This is quadratic in k, but for small k the small constants can mean this algorithm does very well.
Algorithm | Python | C++ | Order guarantee | Data structures | Time |
---|---|---|---|---|---|
rejection sampling | Python | C++ | Random | Set | k |
Maintain a set of already-produced values. While the output isn't full, try choosing a random integer in [0, n). If this is in the set of already-produced values, discard it. Otherwise, append it to the output, and add it to the list of already-produced values.
One of the simplest possible algorithms, but the cost of set lookups and insertions is high.
Algorithm | Python | C++ | Order guarantee | Data structures | Time |
---|---|---|---|---|---|
iterative random choosing | Python | C++ | Sorted | none | n |
Iterate through every possible value in [0, n) in order and decide whether to include it in the output, weighting the probability appropriately depending on the number of spaces left in the output and the number of values remaining to test. One of the faster algorithms for producing sorted output where k is a large fraction of n.
- Donald Knuth
- The Art of Computer Programming
- Volume 2: Seminumerical Algorithms.
- Section 3.4.2
- Page 137
- "Algorithm S"
Algorithm | Python | C++ | Order guarantee | Data structures | Time |
---|---|---|---|---|---|
reservoir sampling | Python | C++ | Random | none | n |
Initialize the output with the integers [0, k) in random order. Then iterate through the remaining values [k, n) in order and decide whether to replace an existing, randomly chosen output with that integer. Fast for producing random output where k is a large fraction of n.
- Donald Knuth
- The Art of Computer Programming
- Volume 2: Seminumerical Algorithms.
- Section 3.4.2
- Page 138
- "Algorithm R"
Algorithm | Python | C++ | Order guarantee | Data structures | Time |
---|---|---|---|---|---|
SELECT | Python | C++ | Random | n-sized list | n |
Initialize an array of the values [0, n). Pick out values at random to include in the output, and delete them by replacing them with a value from one end and shortening the array.
This is known as SELECT in Ernvall and Nevalainen, though in a slight variation we pick off the array at the start, rather than the end.
Algorithm | Python | C++ | Order guarantee | Data structures | Time |
---|---|---|---|---|---|
HSEL | Python | C++ | Random | Dictionary | k |
This is a variation of the above in which we use a hash table to simulate the array of values [0, n), and store only where we change it. First proposed by Ernvall and Nevalainen in 1982:
- Jarmo Ernvall and Olli Nevalainen
- An algorithm for unbiased random sampling
- The Computer Journal Volume 25, Number 1, February, 1982
- 45--47
- https://academic.oup.com/comjnl/article/25/1/45/527401
It is also presented in Martin Harriman, Sampling Without Replacement: Durstenfeld-Fisher-Yates Permutation for Very Large Permutation Sizes.
Algorithm | Python | C++ | Order guarantee | Data structures | Time |
---|---|---|---|---|---|
Floyd's F2 | Python | C++ | none | Set | k |
I refer you to two excellent explanations:
The original presentation is as one of Jon Bentley's Programming Pearls, though the algorithm is due to Robert W Floyd: A Sample of Brilliance.
Algorithm | Python | C++ | Order guarantee | Data structures | Time |
---|---|---|---|---|---|
"cardchoose" | Python | C++ | Sorted | none | k log k |
This is my new algorithm. Detailed description.