Sample

r.Sample(x,k)r.\operatorname{Sample}(\mathbf{x}, k)

Select kk elements from sample x\mathbf{x} without replacement using generator rr.

  • Algorithm — selection sampling (Fan, Muller, Rezucha 1962), see Sample
  • ComplexityO(n)O(n) time, single pass
  • Output — preserves original order of selected elements
  • Domaink0k \geq 0 (clamped to nn if k>nk > n)

Notation

  • x=(x1,,xn)\mathbf{x} = (x_1, \ldots, x_n) sample (n0n \geq 0)
  • xix_i individual measurements

Properties

  • Simple random sample each kk-subset has equal probability
  • Order preservation selected elements appear in order of first occurrence
  • Determinism same generator state produces same selection

Example

  • Rng("demo-sample").Sample([1, 2, 3, 4, 5], 3) — select 3 elements
  • r.Sample(x, n) = x — selecting all elements returns original order

Implementation names

LanguageMethod
C#Rng.Sample()
GoSample()
KotlinRng.sample()
RustRng::sample()
PythonRng.sample()
Rrng$sample()
TypeScriptRng.sample()

Sample\operatorname{Sample} picks a random subset of data without replacement. Common uses include random subsetting, creating cross-validation splits, or reducing a large dataset to a manageable size. Every possible subset of size kk has equal probability of being selected, and the selected elements keep their original order. To make your subsampling reproducible, combine it with a seeded generator: Sample(data, 100, Rng("training-set")) will always select the same 100 elements.

Algorithm

The Sample\operatorname{Sample} function uses selection sampling (see Fan et al. 1962) to select kk elements from nn without replacement.

The algorithm makes a single pass through the data, deciding independently for each element whether to include it, using the Rng generator for random decisions:

seen = 0, selected = 0
for each element x at position i:
   if uniform() < (k - selected) / (n - seen):
       output x
       selected += 1
   seen += 1

This algorithm preserves the original order of elements (order of first appearance) and requires only a single pass through the data. Each element is selected independently with the correct marginal probability, producing a simple random sample.

Tests

Sample(seed,x,k)\operatorname{Sample}(\text{seed}, \mathbf{x}, k)

The Sample\operatorname{Sample} test suite contains 15 test cases validating sampling without replacement. Given a seed, input array x\mathbf{x} of size nn, and draw count kk, Sample\operatorname{Sample} returns kk distinct elements from x\mathbf{x}, preserving their original order. All tests verify reproducibility: the same seed, input, and kk must produce the same output across all language implementations.

Seed variation (n=10n = 10, k=3k = 3) 3 tests with different seeds:

  • seed-0-n10-k3: seed =0= 0
  • seed-123-n10-k3: seed =123= 123
  • seed-999-n10-k3: seed =999= 999

These tests validate that different seeds produce different samples from the same input.

Parameter variation (seed =1729= 1729) 12 tests exploring nn and kk:

  • seed-1729-n1-k1: n=1n = 1, k=1k = 1 (trivial case, single element)
  • seed-1729-n2-k1: n=2n = 2, k=1k = 1 (draw one from two)
  • seed-1729-n5-k3: n=5n = 5, k=3k = 3 (standard draw)
  • seed-1729-n10-k1: n=10n = 10, k=1k = 1 (single draw from many)
  • seed-1729-n10-k3: n=10n = 10, k=3k = 3 (standard draw)
  • seed-1729-n10-k5: n=10n = 10, k=5k = 5 (half draw)
  • seed-1729-n10-k10: n=10n = 10, k=10k = 10 (full permutation)
  • seed-1729-n10-k15: n=10n = 10, k=15k = 15 (k>nk > n, clamped to nn)
  • seed-1729-n20-k5: n=20n = 20, k=5k = 5
  • seed-1729-n20-k10: n=20n = 20, k=10k = 10
  • seed-1729-n100-k10: n=100n = 100, k=10k = 10 (large pool, small draw)
  • seed-1729-n100-k25: n=100n = 100, k=25k = 25 (large pool, moderate draw)

The progression from k=1k = 1 to k=nk = n to k>nk > n validates boundary handling. When knk \geq n, the result is a copy of x\mathbf{x} in its original order.

Seed-based validation All seed =1729= 1729 tests share the same underlying RNG state. Cross-seed tests (00, 123123, 999999) confirm that different seeds yield different permutation sequences.

References

Development of Sampling Plans by Using Sequential (Item by Item) Selection Techniques and Digital Computers
Fan, C. T., Muller, Mervin E., Rezucha, Ivan (1962)
Journal of the American Statistical Association