Sample

r.\operatorname{Sample}(\mathbf{x}, k)

Select $k$ elements from sample $\mathbf{x}$ without replacement using generator $r$ .

Algorithm — selection sampling (Fan, Muller, Rezucha 1962), see Sample
Complexity — $O(n)$ time, single pass
Output — preserves original order of selected elements
Domain — $k \geq 0$ (clamped to $n$ if $k > n$ )

Notation

$\mathbf{x} = (x_1, \ldots, x_n)$ sample ( $n \geq 0$ )
$x_i$ individual measurements

Properties

Simple random sample each $k$ -subset has equal probability
Order preservation selected elements appear in order of first occurrence
Determinism same generator state produces same selection

Example

Rng("demo-sample").Sample([1, 2, 3, 4, 5], 3) — select 3 elements
r.Sample(x, n) = x — selecting all elements returns original order

Implementation names

Language	Method
C#	`Rng.Sample()`
Go	`Sample()`
Kotlin	`Rng.sample()`
Rust	`Rng::sample()`
Python	`Rng.sample()`
R	`rng$sample()`
TypeScript	`Rng.sample()`

$\operatorname{Sample}$ picks a random subset of data without replacement. Common uses include random subsetting, creating cross-validation splits, or reducing a large dataset to a manageable size. Every possible subset of size $k$ has equal probability of being selected, and the selected elements keep their original order. To make your subsampling reproducible, combine it with a seeded generator: Sample(data, 100, Rng("training-set")) will always select the same 100 elements.

Algorithm

The $\operatorname{Sample}$ function uses selection sampling (see Fan et al. 1962) to select $k$ elements from $n$ without replacement.

The algorithm makes a single pass through the data, deciding independently for each element whether to include it, using the Rng generator for random decisions:

seen = 0, selected = 0
for each element x at position i:
   if uniform() < (k - selected) / (n - seen):
       output x
       selected += 1
   seen += 1

This algorithm preserves the original order of elements (order of first appearance) and requires only a single pass through the data. Each element is selected independently with the correct marginal probability, producing a simple random sample.

Tests

\operatorname{Sample}(\text{seed}, \mathbf{x}, k)

The $\operatorname{Sample}$ test suite contains 15 test cases validating sampling without replacement. Given a seed, input array $\mathbf{x}$ of size $n$ , and draw count $k$ , $\operatorname{Sample}$ returns $k$ distinct elements from $\mathbf{x}$ , preserving their original order. All tests verify reproducibility: the same seed, input, and $k$ must produce the same output across all language implementations.

Seed variation ( $n = 10$ , $k = 3$ ) 3 tests with different seeds:

seed-0-n10-k3: seed $= 0$
seed-123-n10-k3: seed $= 123$
seed-999-n10-k3: seed $= 999$

These tests validate that different seeds produce different samples from the same input.

Parameter variation (seed $= 1729$ ) 12 tests exploring $n$ and $k$ :

seed-1729-n1-k1: $n = 1$ , $k = 1$ (trivial case, single element)
seed-1729-n2-k1: $n = 2$ , $k = 1$ (draw one from two)
seed-1729-n5-k3: $n = 5$ , $k = 3$ (standard draw)
seed-1729-n10-k1: $n = 10$ , $k = 1$ (single draw from many)
seed-1729-n10-k3: $n = 10$ , $k = 3$ (standard draw)
seed-1729-n10-k5: $n = 10$ , $k = 5$ (half draw)
seed-1729-n10-k10: $n = 10$ , $k = 10$ (full permutation)
seed-1729-n10-k15: $n = 10$ , $k = 15$ ( $k > n$ , clamped to $n$ )
seed-1729-n20-k5: $n = 20$ , $k = 5$
seed-1729-n20-k10: $n = 20$ , $k = 10$
seed-1729-n100-k10: $n = 100$ , $k = 10$ (large pool, small draw)
seed-1729-n100-k25: $n = 100$ , $k = 25$ (large pool, moderate draw)

The progression from $k = 1$ to $k = n$ to $k > n$ validates boundary handling. When $k \geq n$ , the result is a copy of $\mathbf{x}$ in its original order.

Seed-based validation All seed $= 1729$ tests share the same underlying RNG state. Cross-seed tests ( $0$ , $123$ , $999$ ) confirm that different seeds yield different permutation sequences.

References

Development of Sampling Plans by Using Sequential (Item by Item) Selection Techniques and Digital Computers

Fan, C. T., Muller, Mervin E., Rezucha, Ivan (1962)

Journal of the American Statistical Association

DOI: 10.1080/01621459.1962.10480667