Toolkit
This chapter provides formal definitions and properties of each toolkit function.
Synopsis
One-Sample Estimators
- — robust location; median of pairwise averages.
Like the mean but stable with outliers; tolerates up to 29% corrupted data. - — bounds on center with error rate .
Exact under weak symmetry. - — robust dispersion; median of pairwise absolute differences.
Same units as data; tolerates up to 29% corrupted data. - — relative dispersion .
Dimensionless; compares variability across scales.
Two-Sample Estimators
- — robust location difference; median of pairwise differences.
Negative means first sample tends to be lower. - — bounds on shift with error rate .
If bounds exclude zero, the difference is reliable. - — robust multiplicative ratio via log-space shift.
For positive-valued quantities (latency, price, concentration). - — bounds on ratio with error rate .
If bounds exclude 1, the multiplicative difference is reliable. - — pooled robust dispersion, weighted by sample sizes.
- — robust effect size (robust Cohens d).
Randomization
- — deterministic pseudorandom generator from seed .
Identical sequences across all supported languages. - — select elements without replacement.
- — uniformly random permutation.
- — select elements with replacement.
One-Sample Estimators
Center
Robust measure of location (central tendency).
- Also known as — Hodges-Lehmann estimator, pseudomedian
- Asymptotic — median of the average of two random measurements from
- Complexity — naive, fast (see Fast Center)
- Domain — any real numbers
- Unit — same as measurements
Properties
- Shift equivariance
- Scale equivariance
Example
Center([0, 2, 4, 6, 8]) = 4Center(x + 10) = 14Center(3x) = 12
References
- Hodges & Lehmann 1963
- Sen 1963
is the recommended default for representing where the data is. It works like the familiar mean but does not break when the data contains a few bad measurements or outliers. Up to 29% of data can be corrupted before becomes unreliable. When data is clean, is nearly as precise as the mean (95% efficiency), so the added protection comes at almost no cost. When uncertain whether to use mean, median, or something else, start with .
CenterBounds
where (pairwise averages, sorted) for , , , and
Robust bounds on with specified coverage.
- Also known as — Wilcoxon signed-rank confidence interval for Hodges-Lehmann pseudomedian
- Interpretation — is probability that true center falls outside bounds
- Domain — any real numbers, ,
- Unit — same as measurements
- Note — assumes weak symmetry and weak continuity; exact for , Edgeworth approximation for
Properties
- Shift equivariance
- Scale equivariance
Example
CenterBounds([1..10], 0.01) = [2.5, 8.5]whereCenter = 5.5- Bounds fail to cover true center with probability
provides not just the estimated center but also the uncertainty of that estimate. The function returns an interval of plausible center values given the data. Set to control how often the bounds might fail to contain the true center: use for everyday analysis or for critical decisions where errors are costly. These bounds require weak symmetry but no specific distributional form. If the bounds exclude some reference value, that suggests the true center differs reliably from that value.
Spread
Robust measure of dispersion (variability, scatter).
- Also known as — Shamos scale estimator
- Asymptotic — median of the absolute difference between two random measurements from
- Complexity — naive, fast (see Fast Spread)
- Domain — any real numbers
- Assumptions — sparity(x)
- Unit — same as measurements
Properties
- Shift invariance
- Scale equivariance
- Non-negativity
Example
Spread([0, 2, 4, 6, 8]) = 4Spread(x + 10) = 4Spread(2x) = 8
References
- Shamos 1976
measures how much measurements vary from each other. It serves the same purpose as standard deviation but does not explode with outliers or heavy-tailed data. The result comes in the same units as the measurements, so if is 5 milliseconds, that indicates how much values typically differ. Like , it tolerates up to 29% corrupted data. When comparing variability across datasets, gives a reliable answer even when standard deviation would be misleading or infinite.
RelSpread
Relative dispersion normalized by location.
- Also known as — robust coefficient of variation
- Domain —
- Assumptions — positivity(x)
- Unit — dimensionless
Properties
- Scale invariance
- Non-negativity
Example
RelSpread([1, 3, 5, 7, 9]) = 0.8RelSpread(5x) = 0.8
compares how noisy different datasets are, even if they have completely different scales or units. A dataset centered around 100 with spread of 10 has the same relative variability as one centered around 1000 with spread of 100. Both show 10% relative variation, and captures exactly this. This makes it useful for comparing measurement quality across different experiments, instruments, or physical quantities where absolute numbers are not directly comparable.
Two-Sample Estimators
Shift
Robust measure of location difference between two samples.
- Also known as — Hodges-Lehmann estimator for two samples
- Asymptotic — median of the difference between random measurements from and
- Complexity — naive, fast (see Fast Shift)
- Domain — any real numbers
- Unit — same as measurements
Properties
- Self-difference
- Shift equivariance
- Scale equivariance
- Antisymmetry
Example
Shift([0, 2, 4, 6, 8], [10, 12, 14, 16, 18]) = -10Shift(y, x) = -Shift(x, y)
References
- Hodges & Lehmann 1963
- Sidak et al. 1999
measures how much one group differs from another. When comparing response times between version A and version B, tells by how many milliseconds A is faster or slower than B. A negative result means the first group tends to be lower; positive means it tends to be higher. Unlike comparing means, handles outliers gracefully and works well with skewed data. The result comes in the same units as your measurements, making it easy to interpret.
ShiftBounds
where (sorted), ,
Robust bounds on with specified coverage.
- Also known as — distribution-free confidence interval for Hodges-Lehmann
- Interpretation — is probability that true shift falls outside bounds
- Domain — any real numbers,
- Unit — same as measurements
- Note — assumes weak continuity (ties from measurement resolution are tolerated but may yield conservative bounds)
Properties
- Shift invariance
- Scale equivariance
Example
ShiftBounds([1..30], [21..50], 1e-4) = [-30, -10]whereShift = -20- Bounds fail to cover true shift with probability
provides not just the estimated shift but also the uncertainty of that estimate. The function returns an interval of plausible shift values given the data. Set to control how often the bounds might fail to contain the true shift: use for everyday analysis or for critical decisions where errors are costly. These bounds require no assumptions about your data distribution, so they remain valid for any continuous measurements. If the bounds exclude zero, that suggests a reliable difference between the two groups.
Ratio
Robust measure of scale ratio between two samples — the multiplicative dual of .
- Asymptotic — geometric median of pairwise ratios (via log-space aggregation)
- Domain — ,
- Assumptions — positivity(x), positivity(y)
- Unit — dimensionless
- Complexity — naive , fast via FastRatio
Properties
- Self-ratio
- Scale equivariance
- Multiplicative antisymmetry
Example
Ratio([1, 2, 4, 8, 16], [2, 4, 8, 16, 32]) = 0.5Ratio(x, x) = 1Ratio(2x, 5y) = 0.4 · Ratio(x, y)
Relationship to Shift
is the multiplicative analog of . While computes the median of pairwise differences , computes the median of pairwise ratios via log-transformation. This relationship is expressed formally as:
The log-transformation converts multiplicative relationships to additive ones, allowing the fast algorithm to compute the result efficiently. The exp-transformation converts back to the ratio scale.
is appropriate for multiplicative relationships rather than additive differences. If one system is twice as fast or prices are 30% lower, the underlying thinking is in ratios. A result of 0.5 means the first group is typically half the size of the second; 2.0 means twice as large. This estimator is appropriate for quantities like prices, response times, and concentrations where relative comparisons make more sense than absolute ones. Both samples must contain strictly positive values.
RatioBounds
Robust bounds on with specified coverage — the multiplicative dual of .
- Also known as — distribution-free confidence interval for Hodges-Lehmann ratio
- Interpretation — is probability that true ratio falls outside bounds
- Domain — , ,
- Assumptions — positivity(x), positivity(y)
- Unit — dimensionless
- Note — assumes weak continuity (ties from measurement resolution are tolerated but may yield conservative bounds)
Properties
- Scale invariance
- Scale equivariance
- Multiplicative antisymmetry (bounds reversed)
Example
RatioBounds([1..30], [10..40], 1e-4)whereRatio ≈ 0.5yields bounds containing- Bounds fail to cover true ratio with probability
Relationship to ShiftBounds
is computed via log-transformation:
This means if returns for the log-transformed samples, returns .
provides not just the estimated ratio but also the uncertainty of that estimate. The function returns an interval of plausible ratio values given the data. Set to control how often the bounds might fail to contain the true ratio: use for everyday analysis or for critical decisions where errors are costly. These bounds require no assumptions about your data distribution, so they remain valid for any continuous positive measurements. If the bounds exclude , that suggests a reliable multiplicative difference between the two groups.
AvgSpread
Weighted average of dispersions (pooled scale).
- Also known as — robust pooled standard deviation
- Domain — any real numbers
- Assumptions — sparity(x), sparity(y)
- Unit — same as measurements
- Caveat — (pooled scale, not concatenated spread)
Properties
- Self-average
- Symmetry
- Scale equivariance
- Mixed scaling
Example
AvgSpread(x, y) = 5whereSpread(x) = 6,Spread(y) = 4,n = mAvgSpread(x, y) = AvgSpread(y, x)
provides a single number representing the typical variability across two groups. It combines the spread of both samples, giving more weight to larger samples since they provide more reliable estimates. This pooled spread serves as a common reference scale, essential for expressing a difference in relative terms. uses internally to normalize the shift into a scale-free effect size.
Disparity
Robust effect size (shift normalized by pooled dispersion).
- Also known as — robust Cohens d (Cohen 1988; estimates differ due to robust construction)
- Domain —
- Assumptions — sparity(x), sparity(y)
- Unit — spread units
Properties
- Location invariance
- Scale invariance
- Antisymmetry
Example
Disparity(x, y) = 0.4whereShift = 2,AvgSpread = 5Disparity(x + c, y + c) = Disparity(x, y)Disparity(kx, ky) = Disparity(x, y)
expresses a difference between groups in a way that does not depend on the original measurement units. A disparity of 0.5 means the groups differ by half a spread unit; 1.0 means one full spread unit. Being dimensionless allows comparison of effect sizes across different studies, metrics, or measurement scales. What counts as a large or small disparity depends entirely on the domain and what matters practically in a given application. Do not rely on universal thresholds; interpret the number in context.
Randomization
Rng
Deterministic pseudorandom number generator from seed .
- Algorithm — xoshiro256++ with SplitMix64 seeding (see Pseudorandom Number Generation)
- Seed types — integer seed or string seed (hashed via FNV-1a)
- Determinism — identical sequences across all supported languages
- Period —
Notation
- , random variables (generators of real measurements)
Properties
- Reproducibility produces identical sequence for same
- Independence different seeds produce uncorrelated sequences
Example
Rng("demo-uniform")— string seed for reproducible demosRng("experiment-1")— string seed for named experiments
provides reproducible random numbers. The same seed produces exactly the same sequence of values, identical across Python, TypeScript, R, C#, Kotlin, Rust, and Go. Passing a descriptive string like "experiment-1" makes code self-documenting. Each draw advances the generators internal state, so independent random streams require separate generators with different seeds.
Sample
Select elements from sample without replacement using generator .
- Algorithm — selection sampling (Fan, Muller, Rezucha 1962), see Pseudorandom Number Generation
- Complexity — time, single pass
- Output — preserves original order of selected elements
- Domain —
Notation
- , samples ()
- , individual measurements
Properties
- Simple random sample each -subset has equal probability
- Order preservation selected elements appear in order of first occurrence
- Determinism same generator state produces same selection
Example
Sample([1, 2, 3, 4, 5], 3, Rng("demo-sample"))— select 3 elementsSample(x, n, r) = x— selecting all elements returns original order
picks a random subset of data without replacement. Common uses include bootstrap resampling, creating cross-validation splits, or reducing a large dataset to a manageable size. Every possible subset of size has equal probability of being selected, and the selected elements keep their original order. To make your subsampling reproducible, combine it with a seeded generator: Sample(data, 100, Rng("training-set")) will always select the same 100 elements.
Shuffle
Uniformly random permutation of sample using generator .
- Algorithm — Fisher-Yates (Knuth shuffle), see Pseudorandom Number Generation
- Complexity — time, additional space
- Output — new array (does not modify input)
Properties
- Uniformity each of permutations has equal probability
- Determinism same generator state produces same permutation
Example
Shuffle([1, 2, 3, 4, 5], Rng("demo-shuffle"))— shuffled copyShuffle(x, r)preserves multiset (same elements, different order)
produces a random reordering of data. This is essential for permutation tests and useful for eliminating any bias from the original ordering. Every possible arrangement has exactly equal probability, which is required for valid statistical inference. The function returns a new shuffled array and leaves the original data unchanged. For reproducible results, pass a seeded generator: Shuffle(data, Rng("experiment-1")) will always produce the same permutation.
Resample
Select elements from sample with replacement using generator .
- Algorithm — uniform sampling with replacement
- Complexity — time
- Output — new array with elements (may contain duplicates)
- Domain — , sample size
Notation
- sample ()
- individual measurements
Properties
- Independence each selection is independent with equal probability
- Duplicates same element may appear multiple times in output
- Determinism same generator state produces same selection
Example
Resample([1, 2, 3, 4, 5], 3, Rng("demo-resample"))— select 3 with replacementResample(x, n, r)— bootstrap sample of same size as original
picks elements with replacement, allowing the same element to be selected multiple times. This is essential for bootstrap methods where we simulate new samples from the observed data. Unlike (without replacement), can produce outputs larger than the input and will typically contain duplicate values. For reproducible bootstrap, combine with a seeded generator: Resample(data, n, Rng("bootstrap-1")).
Auxiliary
Median
The value splitting a sorted sample into two equal parts.
- Also known as — 50th percentile, second quartile (Q2)
- Asymptotic — value where
- Complexity — with selection, with sorting
- Domain — any real numbers
- Unit — same as measurements
Notation
- order statistics (sorted sample)
Properties
- Shift equivariance
- Scale equivariance
Example
Median([1, 2, 3, 4, 5]) = 3Median([1, 2, 3, 4]) = 2.5
provides maximum protection against outliers and corrupted data. It achieves a 50% breakdown point, meaning that up to half of the data can be arbitrarily bad before the estimate becomes meaningless. However, this extreme robustness comes at a cost: the median is less precise than when data is clean. For most practical applications, offers a better tradeoff (29% breakdown with 95% efficiency). Reserve for situations with suspected contamination levels above 29% or when the strongest possible robustness guarantee is needed.
PairwiseMargin
Exclusion count for dominance-based bounds.
- Purpose — determines extreme pairwise differences to exclude when constructing bounds
- Based on — distribution of under random sampling
- Returns — total margin split evenly between lower and upper tails
- Used by — to select appropriate order statistics
- Complexity — exact for small samples, approximated for large (see Fast PairwiseMargin)
- Domain — , (minimum achievable)
- Unit — count
- Note — assumes weak continuity (ties from measurement resolution are tolerated)
Properties
- Symmetry
- Bounds
Example
PairwiseMargin(30, 30, 1e-6) = 276PairwiseMargin(30, 30, 1e-4) = 390PairwiseMargin(30, 30, 1e-3) = 464
This is a supporting function that uses internally, so most users do not need to call it directly. It calculates how many extreme pairwise differences should be excluded when constructing bounds, based on sample sizes and the desired error rate. A lower misrate (higher confidence) results in a smaller margin, which produces wider bounds. The function automatically chooses between exact computation for small samples and a fast approximation for large samples.
SignedRankMargin
Exclusion count for one-sample signed-rank based bounds.
- Purpose — determines extreme pairwise averages to exclude when constructing bounds
- Based on — Wilcoxon signed-rank distribution under weak symmetry
- Returns — total margin split evenly between lower and upper tails
- Used by — to select appropriate order statistics
- Complexity — exact for , approximated for larger (see Fast SignedRankMargin)
- Domain — ,
- Unit — count
- Note — assumes weak symmetry and weak continuity
Properties
- Bounds
- Monotonicity lower misrate smaller margin wider bounds
Example
SignedRankMargin(10, 0.05) = 18SignedRankMargin(30, 1e-4) = 112SignedRankMargin(100, 1e-6) = 706
This is a supporting function that uses internally, so most users do not need to call it directly. It calculates how many extreme pairwise averages should be excluded when constructing bounds, based on sample size and the desired error rate. A lower misrate (higher confidence) results in a smaller margin, which produces wider bounds. The function automatically chooses between exact computation for small samples and a fast approximation for large samples.