Sample
Select elements from sample without replacement using generator .
- Algorithm — selection sampling (Fan, Muller, Rezucha 1962), see Sample
- Complexity — time, single pass
- Output — preserves original order of selected elements
- Domain — (clamped to if )
Notation
- sample ()
- individual measurements
Properties
- Simple random sample each -subset has equal probability
- Order preservation selected elements appear in order of first occurrence
- Determinism same generator state produces same selection
Example
Rng("demo-sample").Sample([1, 2, 3, 4, 5], 3)— select 3 elementsr.Sample(x, n) = x— selecting all elements returns original order
Implementation names
| Language | Method |
|---|---|
| C# | Rng.Sample() |
| Go | Sample() |
| Kotlin | Rng.sample() |
| Rust | Rng::sample() |
| Python | Rng.sample() |
| R | rng$sample() |
| TypeScript | Rng.sample() |
picks a random subset of data without replacement. Common uses include random subsetting, creating cross-validation splits, or reducing a large dataset to a manageable size. Every possible subset of size has equal probability of being selected, and the selected elements keep their original order. To make your subsampling reproducible, combine it with a seeded generator: Sample(data, 100, Rng("training-set")) will always select the same 100 elements.
Algorithm
The function uses selection sampling (see Fan et al. 1962) to select elements from without replacement.
The algorithm makes a single pass through the data, deciding independently for each element whether to include it, using the Rng generator for random decisions:
seen = 0, selected = 0
for each element x at position i:
if uniform() < (k - selected) / (n - seen):
output x
selected += 1
seen += 1
This algorithm preserves the original order of elements (order of first appearance) and requires only a single pass through the data. Each element is selected independently with the correct marginal probability, producing a simple random sample.
Tests
The test suite contains 15 test cases validating sampling without replacement. Given a seed, input array of size , and draw count , returns distinct elements from , preserving their original order. All tests verify reproducibility: the same seed, input, and must produce the same output across all language implementations.
Seed variation (, ) 3 tests with different seeds:
seed-0-n10-k3: seedseed-123-n10-k3: seedseed-999-n10-k3: seed
These tests validate that different seeds produce different samples from the same input.
Parameter variation (seed ) 12 tests exploring and :
seed-1729-n1-k1: , (trivial case, single element)seed-1729-n2-k1: , (draw one from two)seed-1729-n5-k3: , (standard draw)seed-1729-n10-k1: , (single draw from many)seed-1729-n10-k3: , (standard draw)seed-1729-n10-k5: , (half draw)seed-1729-n10-k10: , (full permutation)seed-1729-n10-k15: , (, clamped to )seed-1729-n20-k5: ,seed-1729-n20-k10: ,seed-1729-n100-k10: , (large pool, small draw)seed-1729-n100-k25: , (large pool, moderate draw)
The progression from to to validates boundary handling. When , the result is a copy of in its original order.
Seed-based validation All seed tests share the same underlying RNG state. Cross-seed tests (, , ) confirm that different seeds yield different permutation sequences.