Studies

This section analyzes the estimators properties using mathematical proofs. Most proofs are adapted from various textbooks and research papers, but only essential references are provided.

Unlike the main part of the manual, the studies require knowledge of classic statistical methods. Well-known facts and commonly accepted notation are used without special introduction. The studies provide detailed analyses of estimator properties for practitioners interested in rigorous proofs and numerical simulation results.

Summary Estimator Properties

This section compares the toolkits robust estimators against traditional statistical methods to demonstrate their advantages across diverse conditions. While traditional estimators often work well under ideal conditions, the toolkits estimators maintain reliable performance across diverse real-world scenarios.

Average Estimators:

Mean (arithmetic average):

Mean(x)=1ni=1nxi\operatorname{Mean}(\mathbf{x}) = \frac{1}{n} \sum_{i=1}^n x_i

Median:

Median(x)={x(n+12)ifnis odd(x(n2)+x(n2+1))/2ifnis even\begin{aligned} \operatorname{Median}(\mathbf{x}) = \begin{cases} x_{(\frac{n+1}{2})} & \text{if} n \text{is odd} \\ (x_{(\frac{n}{2})} + x_{(\frac{n}{2}+1)}) / 2 & \text{if} n \text{is even} \end{cases} \end{aligned}

Center (Hodges-Lehmann estimator):

Center(x)=Median1ijn(xi+xj2)\operatorname{Center}(\mathbf{x}) = \underset{1 \leq i \leq j \leq n}{\operatorname{Median}} (\frac{x_i + x_j}{2})

Dispersion Estimators:

Standard Deviation:

StdDev(x)=1n1i=1n(xiMean(x))2\operatorname{StdDev}(\mathbf{x}) = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \operatorname{Mean}(\mathbf{x}))^2}

Median Absolute Deviation (around the median):

MAD(x)=Median(xiMedian(x))\operatorname{MAD}(\mathbf{x}) = \operatorname{Median}(\lvert x_i - \operatorname{Median}(\mathbf{x}) \rvert)

Spread (Shamos scale estimator):

Spread(x)=Median1i<jnxixj\operatorname{Spread}(\mathbf{x}) = \underset{1 \leq i < j \leq n}{\operatorname{Median}} \lvert x_i - x_j \rvert

Breakdown

Heavy-tailed distributions naturally produce extreme outliers that completely distort traditional estimators. A single extreme measurement from the Power\underline{\operatorname{Power}} distribution can make the sample mean arbitrarily large. Real-world data can also contain corrupted measurements from instrument failures, recording errors, or transmission problems. Both natural extremes and data corruption create the same challenge: extracting reliable information when some measurements are too influential.

The breakdown point (Huber 2009) is the fraction of a sample that can be replaced by arbitrarily large values without making an estimator arbitrarily large. The theoretical maximum is 50%50\%; no estimator can guarantee reliable results when more than half the measurements are extreme or corrupted. In such cases, summary estimators are not applicable, and a more sophisticated approach is needed.

A 50%50\% breakdown point is rarely needed in practice, as more conservative values also cover practical needs. Additionally, a high breakdown point corresponds to low precision (information is lost by neglecting part of the data). The optimal practical breakdown point should be between 0%0\% (no robustness) and 50%50\% (low precision).

The Center\operatorname{Center} and Spread\operatorname{Spread} estimators achieve 29%29\% breakdown points, providing substantial protection against realistic contamination levels while maintaining good precision. Below is a comparison with traditional estimators.

Asymptotic breakdown points for average estimators:

Mean\operatorname{Mean}Median\operatorname{Median}Center\operatorname{Center}
0%50%29%

Asymptotic breakdown points for dispersion estimators:

StdDev\operatorname{StdDev}MAD\operatorname{MAD}Spread\operatorname{Spread}
0%50%29%

Drift

Drift measures estimator precision by quantifying how much estimates scatter across repeated samples. It is based on the Spread\operatorname{Spread} of estimates and therefore has a breakdown point of approximately 29%29\%.

Drift is useful for comparing the precision of several estimators. To simplify the comparison, one of the estimators can be chosen as a baseline. A table of squared drift values, normalized by the baseline, shows the required sample size adjustment factor for switching from the baseline to another estimator. For example, if Center\operatorname{Center} is the baseline and the rescaled drift square of Median\operatorname{Median} is 1.51.5, this means that Median\operatorname{Median} requires 1.51.5 times more data than Center\operatorname{Center} to achieve the same precision. See the From Statistical Efficiency to Drift section for details.

Squared Asymptotic Drift of Average Estimators (values are approximated):

Mean\operatorname{Mean}Median\operatorname{Median}Center\operatorname{Center}
Additive\underline{\operatorname{Additive}}1.01.5711.047
Multiplic\underline{\operatorname{Multiplic}}3.951.401.7
Exp\underline{\operatorname{Exp}}1.881.881.69
Power\underline{\operatorname{Power}}\infty0.92.1
Uniform\underline{\operatorname{Uniform}}0.882.600.94

Rescaled to Center\operatorname{Center} (sample size adjustment factors):

Mean\operatorname{Mean}Median\operatorname{Median}Center\operatorname{Center}
Additive\underline{\operatorname{Additive}}0.961.501.0
Multiplic\underline{\operatorname{Multiplic}}2.320.821.0
Exp\underline{\operatorname{Exp}}1.111.111.0
Power\underline{\operatorname{Power}}\infty0.431.0
Uniform\underline{\operatorname{Uniform}}0.9362.771.0


Squared Asymptotic Drift of Dispersion Estimators (values are approximated):

StdDev\operatorname{StdDev}MAD\operatorname{MAD}Spread\operatorname{Spread}
Additive\underline{\operatorname{Additive}}0.451.220.52
Multiplic\underline{\operatorname{Multiplic}}\infty2.261.81
Exp\underline{\operatorname{Exp}}1.691.921.26
Power\underline{\operatorname{Power}}\infty3.54.4
Uniform\underline{\operatorname{Uniform}}0.180.900.43

Rescaled to Spread\operatorname{Spread} (sample size adjustment factors):

StdDev\operatorname{StdDev}MAD\operatorname{MAD}Spread\operatorname{Spread}
Additive\underline{\operatorname{Additive}}0.872.351.0
Multiplic\underline{\operatorname{Multiplic}}\infty1.251.0
Exp\underline{\operatorname{Exp}}1.341.521.0
Power\underline{\operatorname{Power}}\infty0.801.0
Uniform\underline{\operatorname{Uniform}}0.422.091.0

Invariance

Invariance properties determine how estimators respond to data transformations. These properties are crucial for analysis design and interpretation:

  • Location-invariant estimators are invariant to additive shifts: T(x+k)=T(x)T(\mathbf{x}+k)=T(\mathbf{x})
  • Scale-invariant estimators are invariant to positive rescaling: T(kx)=T(x)T(k \cdot \mathbf{x})=T(\mathbf{x}) for k>0k>0
  • Equivariant estimators change predictably with transformations, maintaining relative relationships

Choosing estimators with appropriate invariance properties ensures that results remain meaningful across different measurement scales, units, and data transformations. For example, when comparing datasets collected with different instruments or protocols, location-invariant estimators eliminate the need for data centering, while scale-invariant estimators eliminate the need for normalization.

Location-invariance: An estimator TT is location-invariant if adding a constant to the measurements leaves the result unchanged:

T(x+k)=T(x)T(\mathbf{x} + k) = T(\mathbf{x}) T(x+k,y+k)=T(x,y)T(\mathbf{x} + k, \mathbf{y} + k) = T(\mathbf{x}, \mathbf{y})

Location-equivariance: An estimator TT is location-equivariant if it shifts with the data:

T(x+k)=T(x)+kT(\mathbf{x} + k) = T(\mathbf{x}) + k T(x+k1,y+k2)=T(x,y)+f(k1,k2)T(\mathbf{x} + k_1, \mathbf{y} + k_2) = T(\mathbf{x}, \mathbf{y}) + f(k_1, k_2)

Scale-invariance: An estimator TT is scale-invariant if multiplying by a positive constant leaves the result unchanged:

T(kx)=T(x)fork>0T(k \cdot \mathbf{x}) = T(\mathbf{x}) \quad \text{for} k > 0 T(kx,ky)=T(x,y)fork>0T(k \cdot \mathbf{x}, k \cdot \mathbf{y}) = T(\mathbf{x}, \mathbf{y}) \quad \text{for} k > 0

Scale-equivariance: An estimator TT is scale-equivariant if it scales proportionally with the data:

T(kx)=kT(x)orkT(x)fork0T(k \cdot \mathbf{x}) = k \cdot T(\mathbf{x}) \text{or} \lvert k \rvert \cdot T(\mathbf{x}) \quad \text{for} k \neq 0 T(kx,ky)=kT(x,y)orkT(x,y)fork0T(k \cdot \mathbf{x}, k \cdot \mathbf{y}) = k \cdot T(\mathbf{x}, \mathbf{y}) \text{or} \lvert k \rvert \cdot T(\mathbf{x}, \mathbf{y}) \quad \text{for} k \neq 0
LocationScale
CenterEquivariantEquivariant
SpreadInvariantEquivariant
RelSpreadInvariant
ShiftInvariantEquivariant
RatioInvariant
AvgSpreadInvariantEquivariant
DisparityInvariantInvariant

Reframings

From Statistical Efficiency to Drift

Statistical efficiency measures estimator precision (Serfling 2009). When multiple estimators target the same quantity, efficiency determines which provides more reliable results.

Efficiency measures how tightly estimates cluster around the true value across repeated samples. For an estimator TT applied to samples from distribution XX, absolute efficiency is defined relative to the optimal estimator TT^*:

Efficiency(T,X)=Var[T(X1,,Xn)]Var[T(X1,,Xn)]\text{Efficiency}(T, X) = \frac{\text{Var}[T^*(X_1, \ldots, X_n)]}{\text{Var}[T(X_1, \ldots, X_n)]}

Relative efficiency compares two estimators by taking the ratio of their variances:

RelativeEfficiency(T1,T2,X)=Var[T2(X1,,Xn)]Var[T1(X1,,Xn)]\text{RelativeEfficiency}(T_1, T_2, X) = \frac{\text{Var}[T_2(X_1, \ldots, X_n)]}{\text{Var}[T_1(X_1, \ldots, X_n)]}

Under Additive\underline{\operatorname{Additive}} (Normal) distributions, this approach works well. The sample mean achieves optimal efficiency, while the median operates at roughly 64% efficiency.

However, this variance-based definition creates four critical limitations:

  • Absolute efficiency requires knowing the optimal estimator, which is difficult to determine. For many distributions, deriving the minimum-variance unbiased estimator requires complex mathematical analysis. Without this reference point, absolute efficiency cannot be computed.
  • Relative efficiency only compares estimator pairs, preventing systematic evaluation. This limits understanding of how multiple estimators perform relative to each other. Practitioners cannot rank estimators comprehensively or evaluate individual performance in isolation.
  • The approach depends on variance calculations that break down when variance becomes infinite or when distributions have heavy tails. Many real-world distributions, such as those with power-law tails, exhibit infinite variance. When the variance is undefined, efficiency comparisons become impossible.
  • Variance is not robust to outliers, which can corrupt efficiency calculations. A single extreme observation can greatly inflate variance estimates. This sensitivity can make efficient estimators look inefficient and vice versa.

The Drift\operatorname{Drift} concept provides a robust alternative. Drift measures estimator precision using Spread\operatorname{Spread} instead of variance, providing reliable comparisons across a wide range of distributions.

For an average estimator TT, random variable XX, and sample size nn:

AvgDrift(T,X,n)=nSpread[T(X1,,Xn)]Spread[X]\operatorname{AvgDrift}(T, X, n) = \frac{\sqrt{n} \cdot \operatorname{Spread}[T(X_1, \ldots, X_n)]}{\operatorname{Spread}[X]}

This formula measures estimator variability compared to data variability. Spread[T(X1,,Xn)]\operatorname{Spread}[T(X_1, \ldots, X_n)] captures the median absolute difference between estimates across repeated samples. Multiplying by n\sqrt{n} removes sample size dependency, making drift values comparable across different sample sizes. Dividing by Spread[X]\operatorname{Spread}[X] creates a scale-free measure that provides consistent drift values across different distribution parameters and measurement units.

Dispersion estimators use a parallel formulation:

DispDrift(T,X,n)=nRelSpread[T(X1,,Xn)]\operatorname{DispDrift}(T, X, n) = \sqrt{n} \cdot \operatorname{RelSpread}[T(X_1, \ldots, X_n)]

Here RelSpread\operatorname{RelSpread} normalizes by the estimators typical value for fair comparison.

Drift offers four key advantages:

  • For estimators with n\sqrt{n} convergence rates, drift remains finite and comparable across distributions; for heavier tails, drift may diverge, flagging estimator instability.
  • It provides absolute precision measures rather than only pairwise comparisons.
  • The robust Spread\operatorname{Spread} foundation resists outlier distortion that corrupts variance-based calculations.
  • The n\sqrt{n} normalization makes drift values comparable across different sample sizes, enabling direct comparison of estimator performance regardless of sample size.

Under Additive\underline{\operatorname{Additive}} (Normal) conditions, drift matches traditional efficiency. The sample mean achieves drift near 1.0; the median achieves drift around 1.25. This consistency validates drift as a proper generalization of efficiency that extends to realistic data conditions where traditional efficiency fails.

When switching from one estimator to another while maintaining the same precision, the required sample size adjustment follows:

nnew=noriginalDrift2(T2,X)Drift2(T1,X)n_{\text{new}} = n_{\text{original}} \cdot \frac{\operatorname{Drift}^2(T_2, X)}{\operatorname{Drift}^2(T_1, X)}

This applies when estimator T1T_1 has lower drift than T2T_2.

The ratio of squared drifts determines the data requirement change. If T2T_2 has drift 1.5 times higher than T1T_1, then T2T_2 requires (1.5)2=2.25(1.5)^2 = 2.25 times more data to match T1T_1s precision. Conversely, switching to a more precise estimator allows smaller sample sizes.

For asymptotic analysis, Drift(T,X)\operatorname{Drift}(T, X) denotes the limiting value as nn \to \infty. With a baseline estimator, rescaled drift values enable direct comparisons:

Driftbaseline(T,X)=Drift(T,X)/Drift(Tbaseline,X)\operatorname{Drift}_{\text{baseline}}(T, X) = \operatorname{Drift}(T, X) / \operatorname{Drift}(T_{\text{baseline}}, X)

The standard drift definition assumes n\sqrt{n} convergence rates typical under Additive\underline{\operatorname{Additive}} (Normal) conditions. For broader applicability, drift generalizes to:

AvgDrift(T,X,n)=ninstabilitySpread[T(X1,,Xn)]Spread[X]\operatorname{AvgDrift}(T, X, n) = \frac{n^{\text{instability}} \cdot \operatorname{Spread}[T(X_1, \ldots, X_n)]}{\operatorname{Spread}[X]} DispDrift(T,X,n)=ninstabilityRelSpread[T(X1,,Xn)]\operatorname{DispDrift}(T, X, n) = n^{\text{instability}} \cdot \operatorname{RelSpread}[T(X_1, \ldots, X_n)]

The instability parameter adapts to estimator convergence rates. The toolkit uses instability=12\text{instability} = \frac{1}{2} throughout because this choice provides natural intuition and mental representation for the Additive\underline{\operatorname{Additive}} (Normal) distribution. Rather than introduce additional complexity through variable instability parameters, the fixed n\sqrt{n} scaling offers practical convenience while maintaining theoretical rigor for the distribution classes most common in applications.

From Confidence Level to Misrate

Traditional statistics expresses uncertainty through confidence levels: 95% confidence interval, 99% confidence, 99.9% confidence. This convention emerged from early statistical practice when tables printed confidence intervals for common levels like 90%, 95%, and 99%.

The confidence level approach creates practical problems:

  • Cognitive difficulty with high confidence. Distinguishing between 99.999% and 99.9999% confidence requires mental effort. The difference matters — one represents a 1-in-100,000 error rate, the other 1-in-1,000,000 — but the representation obscures this distinction.
  • Asymmetric scale. The confidence level scale compresses near 100%, where most practical values cluster. Moving from 90% to 95% represents a 2× change in error rate, while moving from 99% to 99.9% represents a 10× change, despite similar visual spacing.
  • Indirect interpretation. Practitioners care about error rates, not success rates. Whats the chance Im wrong? matters more than Whats the chance Im right? Confidence level forces mental subtraction to answer the natural question.
  • Unclear defaults. Traditional practice offers no clear default confidence level. Different fields use different conventions (95%, 99%, 99.9%), creating inconsistency and requiring arbitrary choices.

The misrate\mathrm{misrate} parameter provides a more natural representation. Misrate expresses the probability that computed bounds fail to contain the true value:

misrate=1confidence level\mathrm{misrate} = 1 - \text{confidence level}

This simple inversion provides several advantages:

  • Direct interpretation. misrate=0.01\mathrm{misrate} = 0.01 means 1% chance of error or wrong 1 time in 100. misrate=106\mathrm{misrate} = 10^{-6} means wrong 1 time in a million. No mental arithmetic required.
  • Linear scale for practical values. misrate=0.1\mathrm{misrate} = 0.1 (10%), misrate=0.01\mathrm{misrate} = 0.01 (1%), misrate=0.001\mathrm{misrate} = 0.001 (0.1%) form a natural sequence. Scientific notation handles extreme values cleanly: 10310^{-3}, 10610^{-6}, 10910^{-9}.
  • Clear comparisons. 10510^{-5} versus 10610^{-6} immediately shows a 10× difference in error tolerance. 99.999% versus 99.9999% confidence obscures this same relationship.
  • Pragmatic default. The toolkit recommends misrate=103\mathrm{misrate} = 10^{-3} (one-in-a-thousand error rate) as a reasonable default for everyday analysis. For critical decisions where errors are costly, use misrate=106\mathrm{misrate} = 10^{-6} (one-in-a-million).

The terminology shift from confidence level to misrate parallels other clarifying renames in this toolkit. Just as Additive\underline{\operatorname{Additive}} better describes the distributions formation than Normal, and Center\operatorname{Center} better describes the estimators purpose than Hodges-Lehmann, misrate\mathrm{misrate} better describes the quantity practitioners actually reason about: the probability of error.

Traditional confidence intervals become bounds in this framework, eliminating statistical jargon in favor of descriptive terminology. ShiftBounds(x,y,misrate)\operatorname{ShiftBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) clearly indicates: it provides bounds on the shift, with a specified error rate. No background in classical statistics required to understand the concept.

From Mann-Whitney U-test to Pairwise Margin

The Mann-Whitney UU test (also known as the Wilcoxon rank-sum test) ranks among the most widely used non-parametric statistical tests, testing whether two independent samples come from the same distribution. Under Additive\underline{\operatorname{Additive}} (Normal) conditions, it achieves nearly the same precision as the Students tt-test, while maintaining reliability under diverse distributional conditions where the tt-test fails.

The test operates by comparing all pairs of measurements between the two samples. Given samples x=(x1,,xn)\mathbf{x} = (x_1, \ldots, x_n) and y=(y1,,ym)\mathbf{y} = (y_1, \ldots, y_m), the Mann-Whitney UU statistic counts how many pairs satisfy xi>yjx_i > y_j:

U=i=1nj=1m1(xi>yj)U = \sum_{i=1}^n \sum_{j=1}^m \mathbb{1}(x_i > y_j)

If the samples come from the same distribution, UU should be near nm2n \frac{m}{2} (roughly half the pairs favor x\mathbf{x}, half favor y\mathbf{y}). Large deviations from nm2n \frac{m}{2} suggest the distributions differ.

The test answers: Could this UU value arise by chance if the samples were truly equivalent? The pp-value quantifies this probability. If p<0.05p < 0.05, traditional practice declares the difference statistically significant.

This approach creates several problems for practitioners:

  • Binary thinking. The test produces a yes/no answer: reject or fail to reject the null hypothesis. Practitioners typically want to know the magnitude of difference, not just whether one exists.
  • Arbitrary thresholds. The 0.05 threshold has no universal justification, yet it dominates practice and creates a false dichotomy between p=0.049p = 0.049 and p=0.051p = 0.051.
  • Hypothesis-centric framework. The test assumes a null hypothesis of no difference and evaluates evidence against it. Real questions rarely concern exact equality; practitioners want to know how different? rather than different or not?
  • Inverted logic. The natural question is what shifts are consistent with my data? The test answers is this specific shift (zero) consistent with my data?

The toolkit inverts this framework. Instead of testing whether a hypothesized shift is plausible, we compute which shifts are plausible given the data. This inversion transforms hypothesis testing into bounds estimation.

The mathematical foundation remains the same. The distribution of pairwise comparisons under random sampling determines which order statistics of pairwise differences form reliable bounds. The Mann-Whitney UU statistic measures pairwise comparisons (xi>yjx_i > y_j). The Shift\operatorname{Shift} estimator uses pairwise differences (xiyjx_i - y_j). These quantities are mathematically related: a pairwise difference xiyjx_i - y_j is positive exactly when xi>yjx_i > y_j. The toolkit renames this comparison count from UU to Dominance(x,y)\text{Dominance}(\mathbf{x}, \mathbf{y}), clarifying its purpose: measuring how often one sample dominates the other in pairwise comparisons.

The distribution of Dominance\text{Dominance} determines which order statistics form reliable bounds. Define the margin function:

PairwiseMargin(n,m,misrate)=number of pairwise differences to exclude from bounds\operatorname{PairwiseMargin}(n, m, \mathrm{misrate}) = \text{number of pairwise differences to exclude from bounds}

This function computes how many extreme pairwise differences could occur by chance with probability misrate\mathrm{misrate}, based on the distribution of pairwise comparisons.

The PairwiseMargin\operatorname{PairwiseMargin} function requires knowing the distribution of pairwise comparisons under sampling. Two computational approaches exist:

  • Exact computation (Löfflers algorithm, 1982). Uses a recurrence relation to compute the exact distribution of pairwise comparisons for small samples. Practical for combined sample sizes up to several hundred.
  • Approximation (Edgeworth expansion, 1955). Refines the normal approximation with correction terms based on higher moments of the distribution. Provides accurate results for large samples where exact computation becomes impractical.

The toolkit automatically selects the appropriate method based on sample sizes, ensuring both accuracy and computational efficiency.

This approach naturally complements Center\operatorname{Center} and Spread\operatorname{Spread}:

  • Center(x)\operatorname{Center}(\mathbf{x}) uses the median of pairwise averages xi+xj2\frac{x_i + x_j}{2}
  • Spread(x)\operatorname{Spread}(\mathbf{x}) uses the median of pairwise differences xixj\lvert x_i - x_j \rvert
  • Shift(x,y)\operatorname{Shift}(\mathbf{x}, \mathbf{y}) uses the median of pairwise differences xiyjx_i - y_j
  • ShiftBounds(x,y,misrate)\operatorname{ShiftBounds}(\mathbf{x}, \mathbf{y}, \mathrm{misrate}) uses order statistics of the same pairwise differences

All procedures build on pairwise operations. This structural consistency reflects the mathematical unity underlying robust statistics: pairwise operations provide natural robustness while maintaining computational feasibility and statistical efficiency.

The inversion from hypothesis testing to bounds estimation represents a philosophical shift in statistical practice. Traditional methods ask should I believe this specific hypothesis? Pragmatic methods ask what should I believe, given this data? Bounds provide actionable answers: they tell practitioners which values are plausible, enabling informed decisions without arbitrary significance thresholds.

Traditional Mann-Whitney implementations apply tie correction when samples contain repeated values. This correction modifies variance calculations to account for tied observations, changing pp-values and confidence intervals in ways that depend on measurement precision. The toolkit deliberately omits tie correction. Continuous distributions produce theoretically distinct values; observed ties result from finite measurement precision and digital representation. When measurements appear identical, this reflects rounding of underlying continuous variation, not true equality in the measured quantity. Treating ties as artifacts of discretization rather than distributional features simplifies computation while maintaining accuracy. The exact and approximate methods compute comparison distributions without requiring adjustments for tied values, eliminating a source of complexity and potential inconsistency in statistical practice.

Historical Development

The mathematical foundations emerged through decades of refinement. Mann and Whitney (1947) established the distribution of pairwise comparisons under random sampling, creating the theoretical basis for comparing samples through rank-based methods. Their work demonstrated that comparison counts follow predictable patterns regardless of the underlying population distributions.

The original computational approaches suffered from severe limitations. Mann and Whitney proposed a slow exact method requiring exponential resources and a normal approximation that proved grossly inaccurate for practical use. The approximation works reasonably in distribution centers but fails catastrophically in the tails where practitioners most need accuracy. For moderate sample sizes, approximation errors can exceed factors of 101110^11.

Fix and Hodges (1955) addressed the approximation problem through higher-order corrections. Their expansion adds terms based on the distributions actual moments rather than assuming perfect normality. This refinement reduces tail probability errors from orders of magnitude to roughly 1%, making approximation practical for large samples where exact computation becomes infeasible.

Löffler (1982) solved the exact computation problem through algorithmic innovation. The naive recurrence requires quadratic memory— infeasible for samples beyond a few dozen measurements. Löffler discovered a reformulation that reduces memory to linear scale, making exact computation practical for combined sample sizes up to several hundred.

Despite these advances, most statistical software continues using the 1947 approximation. The computational literature contains the solutions, but software implementations lag decades behind theoretical developments. This toolkit implements both the exact method for small samples and the refined approximation for large samples, automatically selecting the appropriate approach based on sample sizes.

The shift from hypothesis testing to bounds estimation requires no new mathematics. The same comparison distributions that enable hypothesis tests also determine which order statistics form reliable bounds. Traditional applications ask is zero plausible? and answer yes or no. This toolkit asks which values are plausible? and answers with an interval. The perspective inverts while the mathematical foundation remains identical.

Notes

On Bootstrap for Center Bounds

A natural question arises: can bootstrap resampling improve CenterBounds\operatorname{CenterBounds} coverage for asymmetric distributions where the weak symmetry assumption fails?

The idea is appealing. The signed-rank approach computes bounds from order statistics of Walsh averages using a margin derived from the Wilcoxon distribution, which assumes symmetric deviations from the center. Bootstrap makes no symmetry assumption: resample the data with replacement, compute Center\operatorname{Center} on each resample, and take quantiles of the bootstrap distribution as bounds. This should yield valid bounds regardless of distributional shape.

This manual deliberately does not provide a bootstrap-based alternative to CenterBounds\operatorname{CenterBounds}. The reasons are both computational and statistical.

Computational cost

CenterBounds\operatorname{CenterBounds} computes bounds in O(nlogn)O(n \log n) time: a single pass through the Walsh averages guided by the signed-rank margin. No resampling, no iteration.

A bootstrap version requires BB resamples (typically B=10000B = 10000 for stable tail quantiles), each computing Center\operatorname{Center} on the resample. Center\operatorname{Center} itself costs O(nlogn)O(n \log n) via the fast selection algorithm on the implicit pairwise matrix. The total cost becomes O(Bnlogn)O(B \cdot n \log n) per call — roughly 10000×10000 \times slower than the signed-rank approach.

For n=5n = 5, each call to Center\operatorname{Center} operates on 1515 Walsh averages. The bootstrap recomputes this 1000010000 times. The computation is not deep — it is merely wasteful. For n=100n = 100, there are 50505050 Walsh averages per resample, and 1000010000 resamples produce 5×1075 \times 10^7 selection operations per bounds call. In a simulation study that evaluates bounds across many samples, this cost becomes prohibitive.

Statistical quality

Bootstrap bounds are nominal, not exact. The percentile method has well-documented undercoverage for small samples: requesting 95% confidence (misrate=0.05\mathrm{misrate} = 0.05) typically yields 85–92% actual coverage for n<30n < 30. This is inherent to the bootstrap percentile method — the quantile estimates from BB resamples are biased toward the sample and underrepresent tail behavior. Refined methods (BCa, bootstrap-tt) partially address this but add complexity and still provide only asymptotic guarantees.

Meanwhile, CenterBounds\operatorname{CenterBounds} provides exact distribution-free coverage under symmetry. For n=5n = 5 requesting misrate=0.1\mathrm{misrate} = 0.1, the signed-rank method delivers exactly 10%10\% misrate. A bootstrap method, requesting the same 10%10\%, typically delivers 121215%15\% misrate. The exact method is simultaneously faster and more accurate.

Behavior under asymmetry

Under asymmetric distributions, the signed-rank margin is no longer calibrated: the Wilcoxon distribution assumes symmetric deviations, and asymmetry shifts the actual distribution of comparison counts.

However, the coverage degradation is gradual, not catastrophic. Mild asymmetry produces mild coverage drift. The bounds remain meaningful — they still bracket the pseudomedian using order statistics of the Walsh averages — but the actual misrate differs from the requested value.

This is the same situation as bootstrap, which also provides only approximate coverage. The practical difference is that the signed-rank approach achieves this approximate coverage in O(nlogn)O(n \log n) time, while bootstrap achieves comparable approximate coverage in O(Bnlogn)O(B \cdot n \log n) time.

Why not both?

One might argue for providing both methods: the signed-rank approach as default, and a bootstrap variant for cases where symmetry is severely violated.

This creates a misleading choice. If the bootstrap method offered substantially better coverage under asymmetry, the complexity would be justified. But for the distributions practitioners encounter (Multiplic\underline{\operatorname{Multiplic}}, Exp\underline{\operatorname{Exp}}, and other moderate asymmetries), the coverage difference between the two approaches is small relative to the 10000×10000 \times cost difference. For extreme asymmetries where the signed-rank coverage genuinely breaks down, the sign test provides an alternative foundation for median bounds (see the study on misrate efficiency of MedianBounds), but its O(n12)O(n^{-\frac{1}{2}}) efficiency convergence makes it impractical for moderate sample sizes.

The toolkit therefore provides CenterBounds\operatorname{CenterBounds} as the single bounds estimator. The weak symmetry assumption means the method performs well under approximate symmetry and degrades gracefully under moderate asymmetry. There is no useful middle ground that justifies a 10000×10000 \times computational penalty for marginally different approximate coverage.

On Misrate Efficiency of MedianBounds

This study analyzes MedianBoundsMedianBounds, a bounds estimator for the population median based on the sign test, and explains why pragmastat omits it in favor of CenterBounds\operatorname{CenterBounds}.

Definition

MedianBounds(x,misrate)=[x(k),x(nk+1)]MedianBounds(\mathbf{x}, \mathrm{misrate}) = [x_{(k)}, x_{(n-k+1)}]

where kk is the largest integer satisfying 2Pr(Bk1)misrate2 \cdot \Pr(B \leq k-1) \leq \mathrm{misrate} and BBinomial(n,0.5)B \sim \text{Binomial}(n, 0.5). The interval brackets the population median using order statistics, with misrate\mathrm{misrate} controlling the probability that the true median falls outside the bounds.

MedianBoundsMedianBounds requires no symmetry assumption — only weak continuity — making it applicable to arbitrarily skewed distributions. This is its principal advantage over CenterBounds\operatorname{CenterBounds}, which assumes weak symmetry.

Sign test foundation

The method is equivalent to inverting the sign test. Under weak continuity, each observation independently falls above or below the true median with probability 12\frac{1}{2}. The number of observations below the median follows Binomial(n,0.5)\text{Binomial}(n, 0.5), and the order statistics x(k)x_{(k)} and x(nk+1)x_{(n-k+1)} form a confidence interval whose coverage is determined exactly by the binomial CDF.

Because the binomial CDF is a step function, the achievable misrate values form a discrete set. The algorithm rounds down to the nearest achievable level, inevitably wasting part of the requested misrate budget. This study derives the resulting efficiency loss and its convergence rate.

Achievable misrate levels

The achievable misrate values for sample size nn are:

mk=2Pr(Bk),k=0,1,2,dotsm_k = 2 \cdot \Pr(B \leq k), \quad k = 0, 1, 2, dots

The algorithm selects the largest kk satisfying mkmisratem_k \leq \mathrm{misrate}. The efficiency η=mkmisrate\eta = \frac{m_k}{\mathrm{misrate}} measures how much of the requested budget is used. Efficiency η=1\eta = 1 means the bounds are as tight as the misrate allows; η=0.5\eta = 0.5 means half the budget is wasted, producing unnecessarily wide bounds.

Spacing between consecutive levels

The gap between consecutive achievable misrates is:

Δmk=mk+1mk=2Pr(B=k+1)=2(nk+1)/2n\Delta m_k = m_{k+1} - m_k = 2 \cdot \Pr(B = k+1) = 2 \cdot \binom{n}{k+1} / 2^n

For a target misrate α\alpha, the relevant index kk satisfies mkαm_k \approx \alpha. By the normal approximation to the binomial, Bcal(N)(n2,n4)B \approx cal(N)(\frac{n}{2}, \frac{n}{4}), the binomial CDF near this index changes by approximately:

Δm4ϕ(zα)n\Delta m \approx \frac{4 \phi(z_\alpha)}{\sqrt{n}}

where zα=Φ1(α2)z_\alpha = \Phi^{-1}(\frac{\alpha}{2}) is the corresponding standard normal quantile and ϕ\phi is the standard normal density. This spacing governs how coarsely the achievable misrates are distributed near the target.

Expected efficiency

The requested misrate α\alpha falls at a uniformly random position within a gap of width Δm\Delta m. On average, the algorithm wastes Δm2\Delta \frac{m}{2}, giving expected efficiency:

E[η]1Δm2α=12ϕ(zα)αn\mathbb{E}[\eta] \approx 1 - \frac{\Delta m}{2 \alpha} = 1 - \frac{2 \phi(z_\alpha)}{\alpha \sqrt{n}}

Define the misrate-dependent constant:

C(α)=2ϕ(Φ1(α2))αC(\alpha) = \frac{2 \phi(\Phi^{-1}(\frac{\alpha}{2}))}{\alpha}

Then the expected efficiency has the form:

E[η]1C(α)n\mathbb{E}[\eta] \approx 1 - \frac{C(\alpha)}{\sqrt{n}}

The convergence rate is O(n12)O(n^{-\frac{1}{2}}): efficiency improves as the square root of sample size.

Values of C(α)C(\alpha)

The constant C(α)C(\alpha) increases for smaller misrates, meaning tighter error tolerances require proportionally larger samples for efficient bounds:

α\alpha0.10.10.050.050.010.010.0050.0050.0010.001
zαz_\alpha1.64-1.641.96-1.962.58-2.582.81-2.813.29-3.29
ϕ(zα)\phi(z_\alpha)0.1030.1030.0580.0580.0150.0150.0080.0080.0020.002
C(α)C(\alpha)2.062.062.332.332.942.943.173.173.453.45

For α=0.1\alpha = 0.1 and n=50n = 50: E[η]12.06500.71\mathbb{E}[\eta] \approx 1 - 2.\frac{06}{\sqrt{50}} \approx 0.71. Achieving 90%90\% efficiency on average requires n>(C(α)0.1)2n > (\frac{C(\alpha)}{0}.1)^2. For α=0.1\alpha = 0.1 this gives n>424n > 424; for α=0.001\alpha = 0.001 this gives n>1190n > 1190.

Comparison with CenterBounds

CenterBounds\operatorname{CenterBounds} uses the signed-rank statistic WW with range [0,n(n+1)2][0, \frac{n(n+1)}{2}]. Under the null hypothesis, WW has variance σ2=n(n+1)2n+124n312\sigma^2 = n(n+1)\frac{2n+1}{24} \approx n^\frac{3}{12}. The CDF spacing at the relevant quantile is:

ΔmW212ϕ(zα)n32\Delta m_W \approx \frac{2 \sqrt{12} \cdot \phi(z_\alpha)}{n^{\frac{3}{2}}}

The expected efficiency for CenterBounds\operatorname{CenterBounds} is therefore:

E[ηW]112ϕ(zα)αn32\mathbb{E}[\eta_W] \approx 1 - \frac{\sqrt{12} \cdot \phi(z_\alpha)}{\alpha \cdot n^{\frac{3}{2}}}

This converges at rate O(n32)O(n^{-\frac{3}{2}}) — three polynomial orders faster than MedianBoundsMedianBounds. The difference arises because the signed-rank distribution has n(n+1)2\frac{n(n+1)}{2} discrete levels compared to the binomials nn levels, providing fundamentally finer resolution.

Why pragmastat omits MedianBounds

The efficiency loss of MedianBoundsMedianBounds is not an implementation artifact. It reflects a structural limitation of the sign test: using only the signs of (Xiθ)(X_i - \theta) discards magnitude information, leaving only nn binary observations to determine coverage. The signed-rank test used by CenterBounds\operatorname{CenterBounds} exploits both signs and ranks, producing n(n+1)2\frac{n(n+1)}{2} comparison outcomes and correspondingly finer misrate resolution.

For applications requiring tight misrate control on the median, large samples (n>500n > 500) are needed to ensure efficient use of the misrate budget. For smaller samples, the bounds remain valid but conservative: the actual misrate is guaranteed to not exceed the requested value, even though it may be substantially below it.

CenterBounds\operatorname{CenterBounds} with its O(n32)O(n^{-\frac{3}{2}}) convergence achieves near-continuous misrate control even for moderate nn, at the cost of requiring weak symmetry. For the distributions practitioners typically encounter, this tradeoff favors CenterBounds\operatorname{CenterBounds} as the single bounds estimator in the toolkit. When symmetry is severely violated, the coverage drift of CenterBounds\operatorname{CenterBounds} is gradual — mild asymmetry produces mild drift — making it a robust default without the efficiency penalty inherent to the sign test approach.

Additive (Normal) Distribution

The Additive\underline{\operatorname{Additive}} (Normal) distribution has two parameters: the mean and the standard deviation, written as Additive(mean,stdDev)\underline{\operatorname{Additive}}(\mathrm{mean}, \mathrm{stdDev}).

Asymptotic Spread Value

Consider two independent draws XX and YY from the Additive(mean,stdDev)\underline{\operatorname{Additive}}(\mathrm{mean}, \mathrm{stdDev}) distribution. The goal is to find the median of their absolute difference XY\lvert X-Y \rvert. Define the difference D=XYD = X - Y. By linearity of expectation, E[D]=0\mathbb{E}[D] = 0. By independence, Var[D]=2stdDev2\operatorname{Var}[D] = 2 \cdot \mathrm{stdDev}^2. Thus DD has distribution Additive(0,2stdDev)\underline{\operatorname{Additive}}(0, \sqrt{2} \cdot \mathrm{stdDev}), and the problem reduces to finding the median of D\lvert D \rvert. The location parameter mean\mathrm{mean} disappears, as expected, because absolute differences are invariant to shifts.

Let τ=2stdDev\tau=\sqrt{2} \cdot \mathrm{stdDev}, so that DAdditive(0,τ)D \sim \underline{\operatorname{Additive}}(0, \tau). The random variable D\lvert D \rvert then follows the Half-Additive\underline{\operatorname{Additive}} (Folded Normal) distribution with scale τ\tau. Its cumulative distribution function for z0z \geq 0 becomes

FD(z)=Pr(Dz)=2Φ(zτ)1F_{\lvert D \rvert}(z) = \Pr(\lvert D \rvert \leq z) = 2 \Phi \left(\frac{z}{\tau}\right) - 1

where Φ\Phi denotes the standard Additive\underline{\operatorname{Additive}} (Normal) CDF.

The median mm is the point at which this cdf equals 12\frac{1}{2}. Setting FD(m)=12F_{\lvert D \rvert}(m)=\frac{1}{2} gives

2Φ(mτ)1=12Φ(mτ)=342 \Phi \left(\frac{m}{\tau}\right) - 1 = \frac{1}{2} \Rightarrow \Phi \left(\frac{m}{\tau}\right) = \frac{3}{4}

Applying the inverse cdf yields mτ=z0.75\frac{m}{\tau} = z_{0.75}. Substituting back τ=2stdDev\tau = \sqrt{2} \cdot \mathrm{stdDev} produces

Median(XY)=2z0.75stdDev\operatorname{Median}(\lvert X-Y \rvert) = \sqrt{2} \cdot z_{0.75} \cdot \mathrm{stdDev}

Define z0.75:=Φ1(0.75)0.6744897502z_{0.75} := \Phi^{-1}(0.75) \approx 0.6744897502. Numerically, the median absolute difference is approximately 2z0.75stdDev0.9538725524stdDev\sqrt{2} \cdot z_{0.75} \cdot \mathrm{stdDev} \approx 0.9538725524 \cdot \mathrm{stdDev}. This expression depends only on the scale parameter stdDev\mathrm{stdDev}, not on the mean, reflecting the translation invariance of the problem.

Lemma: Average Estimator Drift Formula

For average estimators TnT_n with asymptotic standard deviation astdDev/na \cdot \mathrm{stdDev} / \sqrt{n} around the mean μ\mu, define RelSpread[Tn]:=Spread[Tn]Spread[X]\operatorname{RelSpread}[T_n] := \frac{\operatorname{Spread}[T_n]}{\operatorname{Spread}[X]}. In the Additive\underline{\operatorname{Additive}} (Normal) case, Spread[X]=2z0.75stdDev\operatorname{Spread}[X] = \sqrt{2} \cdot z_{0.75} \cdot \mathrm{stdDev}.

For any average estimator TnT_n with asymptotic standard deviation astdDev/na \cdot \mathrm{stdDev} / \sqrt{n} around the mean μ\mu, the drift calculation follows:

  • The spread of two independent estimates: Spread[Tn]=2z0.75astdDev/n\operatorname{Spread}[T_n] = \sqrt{2} \cdot z_{0.75} \cdot a \cdot \mathrm{stdDev} / \sqrt{n}
  • The relative spread: RelSpread[Tn]=an\operatorname{RelSpread}[T_n] = \frac{a}{\sqrt{n}}
  • The asymptotic drift: Drift(T,X)=a\operatorname{Drift}(T, X) = a

Asymptotic Mean Drift

For the sample mean Mean(x)=1ni=1nxi\operatorname{Mean}(\mathbf{x}) = \frac{1}{n} \sum_{i=1}^n x_i applied to samples from Additive(mean,stdDev)\underline{\operatorname{Additive}}(\mathrm{mean}, \mathrm{stdDev}), the sampling distribution of Mean\operatorname{Mean} is also additive with mean mean\mathrm{mean} and standard deviation stdDev/n\mathrm{stdDev}/\sqrt{n}.

Using the lemma with a=1a = 1 (since the standard deviation is stdDev/n\mathrm{stdDev}/\sqrt{n}):

Drift(Mean,X)=1\operatorname{Drift}(\operatorname{Mean}, X) = 1

Mean\operatorname{Mean} achieves unit drift under the Additive\underline{\operatorname{Additive}} (Normal) distribution, serving as the natural baseline for comparison. Mean\operatorname{Mean} is the optimal estimator under the Additive\underline{\operatorname{Additive}} (Normal) distribution: no other estimator achieves lower Drift\operatorname{Drift}.

Asymptotic Median Drift

For the sample median Median(x)\operatorname{Median}(\mathbf{x}) applied to samples from Additive(mean,stdDev)\underline{\operatorname{Additive}}(\mathrm{mean}, \mathrm{stdDev}), the asymptotic sampling distribution of Median\operatorname{Median} is approximately Additive\underline{\operatorname{Additive}} (Normal) with mean mean\mathrm{mean} and standard deviation π2stdDev/n\sqrt{\frac{\pi}{2}} \cdot \mathrm{stdDev}/\sqrt{n}.

This result follows from the asymptotic theory of order statistics. For the median of a sample from a continuous distribution with density ff and cumulative distribution FF, the asymptotic variance is 14n[f(F1(0.5))]2\frac{1}{4n[f(F^{-1}(0.5))]^2}. For the Additive\underline{\operatorname{Additive}} (Normal) distribution with standard deviation stdDev\mathrm{stdDev}, the density at the median (which equals the mean) is 1stdDev2π\frac{1}{\mathrm{stdDev} \sqrt{2 \pi}}. Thus the asymptotic variance becomes πstdDev22n\pi \cdot \mathrm{stdDev}^\frac{2}{2n}.

Using the lemma with a=π2a = \sqrt{\frac{\pi}{2}}:

Drift(Median,X)=π2\operatorname{Drift}(\operatorname{Median}, X) = \sqrt{\frac{\pi}{2}}

Numerically, π21.2533\sqrt{\frac{\pi}{2}} \approx 1.2533, so the median has approximately 25% higher drift than the mean under the Additive\underline{\operatorname{Additive}} (Normal) distribution.

Asymptotic Center Drift

For the sample center Center(x)=Median1ijnxi+xj2\operatorname{Center}(\mathbf{x}) = \underset{1 \leq i \leq j \leq n}{\operatorname{Median}} \frac{x_i + x_j}{2} applied to samples from Additive(mean,stdDev)\underline{\operatorname{Additive}}(\mathrm{mean}, \mathrm{stdDev}), its asymptotic sampling distribution must be determined.

The center estimator computes all pairwise averages (including i=ji=j) and takes their median. For the Additive\underline{\operatorname{Additive}} (Normal) distribution, asymptotic theory shows that the center estimator is asymptotically Additive\underline{\operatorname{Additive}} (Normal) with mean mean\mathrm{mean}.

The exact asymptotic variance of the center estimator for the Additive\underline{\operatorname{Additive}} (Normal) distribution is:

Var[Center(X1:n)]=πstdDev23n\operatorname{Var}[\operatorname{Center}(X_{1:n})] = \frac{\pi \cdot \mathrm{stdDev}^2}{3n}

This gives an asymptotic standard deviation of:

StdDev[Center(X1:n)]=π3stdDev/n\operatorname{StdDev}[\operatorname{Center}(X_{1:n})] = \sqrt{\frac{\pi}{3}} \cdot \mathrm{stdDev}/\sqrt{n}

Using the lemma with a=π3a = \sqrt{\frac{\pi}{3}}:

Drift(Center,X)=π3\operatorname{Drift}(\operatorname{Center}, X) = \sqrt{\frac{\pi}{3}}

Numerically, π31.0233\sqrt{\frac{\pi}{3}} \approx 1.0233, so the center estimator achieves a drift very close to 1 under the Additive\underline{\operatorname{Additive}} (Normal) distribution, performing nearly as well as the mean while offering greater robustness to outliers.

Lemma: Dispersion Estimator Drift Formula

For dispersion estimators TnT_n with asymptotic center bstdDevb \cdot \mathrm{stdDev} and standard deviation astdDev/na \cdot \mathrm{stdDev} / \sqrt{n}, define RelSpread[Tn]:=Spread[Tn]bstdDev\operatorname{RelSpread}[T_n] := \frac{\operatorname{Spread}[T_n]}{b \cdot \mathrm{stdDev}}.

For any dispersion estimator TnT_n with asymptotic distribution TnapproxAdditive(bstdDev,(astdDev)2n)T_n \sim\text{approx} \underline{\operatorname{Additive}}(b \cdot \mathrm{stdDev}, (a \cdot \mathrm{stdDev})^\frac{2}{n}), the drift calculation follows:

  • The spread of two independent estimates: Spread[Tn]=2z0.75astdDev/n\operatorname{Spread}[T_n] = \sqrt{2} \cdot z_{0.75} \cdot a \cdot \mathrm{stdDev} / \sqrt{n}
  • The relative spread: RelSpread[Tn]=2z0.75abn\operatorname{RelSpread}[T_n] = \sqrt{2} \cdot z_{0.75} \cdot \frac{a}{b \sqrt{n}}
  • The asymptotic drift: Drift(T,X)=2z0.75ab\operatorname{Drift}(T, X) = \sqrt{2} \cdot z_{0.75} \cdot \frac{a}{b}

Note: The 2\sqrt{2} factor comes from the standard deviation of the difference D=T1T2D = T_1 - T_2 of two independent estimates, and the z0.75z_{0.75} factor converts this standard deviation to the median absolute difference.

Asymptotic StdDev Drift

For the sample standard deviation StdDev(x)=1n1i=1n(xiMean(x))2\operatorname{StdDev}(\mathbf{x}) = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \operatorname{Mean}(\mathbf{x}))^2} applied to samples from Additive(mean,stdDev)\underline{\operatorname{Additive}}(\mathrm{mean}, \mathrm{stdDev}), the sampling distribution of StdDev\operatorname{StdDev} is approximately Additive\underline{\operatorname{Additive}} (Normal) for large nn with mean stdDev\mathrm{stdDev} and standard deviation stdDev/2n\mathrm{stdDev}/\sqrt{2n}.

Applying the lemma with a=12a = \frac{1}{\sqrt{2}} and b=1b = 1:

Spread[StdDev(X1:n)]=2z0.7512stdDev/n=z0.75stdDev/n\operatorname{Spread}[\operatorname{StdDev}(X_{1:n})] = \sqrt{2} \cdot z_{0.75} \cdot \frac{1}{\sqrt{2}} \cdot \mathrm{stdDev}/\sqrt{n} = z_{0.75} \cdot \mathrm{stdDev}/\sqrt{n}

For the dispersion drift, we use the relative spread formula:

RelSpread[StdDev(X1:n)]=Spread[StdDev(X1:n)]Center[StdDev(X1:n)]\operatorname{RelSpread}[\operatorname{StdDev}(X_{1:n})] = \frac{\operatorname{Spread}[\operatorname{StdDev}(X_{1:n})]}{\operatorname{Center}[\operatorname{StdDev}(X_{1:n})]}

Since Center[StdDev(X1:n)]stdDev\operatorname{Center}[\operatorname{StdDev}(X_{1:n})] \approx \mathrm{stdDev} asymptotically:

RelSpread[StdDev(X1:n)]=z0.75stdDev/nstdDev=z0.75/n\operatorname{RelSpread}[\operatorname{StdDev}(X_{1:n})] = \frac{z_{0.75} \cdot \mathrm{stdDev}/\sqrt{n}}{\mathrm{stdDev}} = z_{0.75}/\sqrt{n}

Therefore:

Drift(StdDev,X)=limnnRelSpread[StdDev(X1:n)]=z0.75\operatorname{Drift}(\operatorname{StdDev}, X) = lim_{n \to \infty} \sqrt{n} \cdot \operatorname{RelSpread}[\operatorname{StdDev}(X_{1:n})] = z_{0.75}

Numerically, z0.750.67449z_{0.75} \approx 0.67449.

Asymptotic MAD Drift

For the median absolute deviation MAD(x)=Median(xiMedian(x))\operatorname{MAD}(\mathbf{x}) = \operatorname{Median}(\lvert x_i - \operatorname{Median}(\mathbf{x}) \rvert) applied to samples from Additive(mean,stdDev)\underline{\operatorname{Additive}}(\mathrm{mean}, \mathrm{stdDev}), the asymptotic distribution is approximately Additive\underline{\operatorname{Additive}} (Normal).

For the Additive\underline{\operatorname{Additive}} (Normal) distribution, the population MAD equals z0.75stdDevz_{0.75} \cdot \mathrm{stdDev}. The asymptotic standard deviation of the sample MAD is:

StdDev[MAD(X1:n)]=cmadstdDev/n\operatorname{StdDev}[\operatorname{MAD}(X_{1:n})] = c_{\mathrm{mad}} \cdot \mathrm{stdDev}/\sqrt{n}

where cmad0.78c_{\mathrm{mad}} \approx 0.78.

Applying the lemma with a=cmada = c_{\mathrm{mad}} and b=z0.75b = z_{0.75}:

Spread[MAD(X1:n)]=2z0.75cmadstdDev/n\operatorname{Spread}[\operatorname{MAD}(X_{1:n})] = \sqrt{2} \cdot z_{0.75} \cdot c_{\mathrm{mad}} \cdot \mathrm{stdDev}/\sqrt{n}

Since Center[MAD(X1:n)]z0.75stdDev\operatorname{Center}[\operatorname{MAD}(X_{1:n})] \approx z_{0.75} \cdot \mathrm{stdDev} asymptotically:

RelSpread[MAD(X1:n)]=2z0.75cmadstdDev/nz0.75stdDev=2cmadn\operatorname{RelSpread}[\operatorname{MAD}(X_{1:n})] = \frac{\sqrt{2} \cdot z_{0.75} \cdot c_{\mathrm{mad}} \cdot \mathrm{stdDev}/\sqrt{n}}{z_{0.75} \cdot \mathrm{stdDev}} = \frac{\sqrt{2} \cdot c_{\mathrm{mad}}}{\sqrt{n}}

Therefore:

Drift(MAD,X)=limnnRelSpread[MAD(X1:n)]=2cmad\operatorname{Drift}(\operatorname{MAD}, X) = lim_{n \to \infty} \sqrt{n} \cdot \operatorname{RelSpread}[\operatorname{MAD}(X_{1:n})] = \sqrt{2} \cdot c_{\mathrm{mad}}

Numerically, 2cmad20.781.10\sqrt{2} \cdot c_{\mathrm{mad}} \approx \sqrt{2} \cdot 0.78 \approx 1.10.

Asymptotic Spread Drift

For the sample spread Spread(x)=Median1i<jnxixj\operatorname{Spread}(\mathbf{x}) = \underset{1 \leq i < j \leq n}{\operatorname{Median}} \lvert x_i - x_j \rvert applied to samples from Additive(mean,stdDev)\underline{\operatorname{Additive}}(\mathrm{mean}, \mathrm{stdDev}), the asymptotic distribution is approximately Additive\underline{\operatorname{Additive}} (Normal).

The spread estimator computes all pairwise absolute differences and takes their median. For the Additive\underline{\operatorname{Additive}} (Normal) distribution, the population spread equals 2z0.75stdDev\sqrt{2} \cdot z_{0.75} \cdot \mathrm{stdDev} as derived in the Asymptotic Spread Value section.

The asymptotic standard deviation of the sample spread for the Additive\underline{\operatorname{Additive}} (Normal) distribution is:

StdDev[Spread(X1:n)]=csprstdDev/n\operatorname{StdDev}[\operatorname{Spread}(X_{1:n})] = c_{\mathrm{spr}} \cdot \mathrm{stdDev}/\sqrt{n}

where cspr0.72c_{\mathrm{spr}} \approx 0.72.

Applying the lemma with a=cspra = c_{\mathrm{spr}} and b=2z0.75b = \sqrt{2} \cdot z_{0.75}:

Spread[Spread(X1:n)]=2z0.75csprstdDev/n\operatorname{Spread}[\operatorname{Spread}(X_{1:n})] = \sqrt{2} \cdot z_{0.75} \cdot c_{\mathrm{spr}} \cdot \mathrm{stdDev}/\sqrt{n}

Since Center[Spread(X1:n)]2z0.75stdDev\operatorname{Center}[\operatorname{Spread}(X_{1:n})] \approx \sqrt{2} \cdot z_{0.75} \cdot \mathrm{stdDev} asymptotically:

RelSpread[Spread(X1:n)]=2z0.75csprstdDev/n2z0.75stdDev=cspr/n\operatorname{RelSpread}[\operatorname{Spread}(X_{1:n})] = \frac{\sqrt{2} \cdot z_{0.75} \cdot c_{\mathrm{spr}} \cdot \mathrm{stdDev}/\sqrt{n}}{\sqrt{2} \cdot z_{0.75} \cdot \mathrm{stdDev}} = c_{\mathrm{spr}}/\sqrt{n}

Therefore:

Drift(Spread,X)=limnnRelSpread[Spread(X1:n)]=cspr\operatorname{Drift}(\operatorname{Spread}, X) = lim_{n \to \infty} \sqrt{n} \cdot \operatorname{RelSpread}[\operatorname{Spread}(X_{1:n})] = c_{\mathrm{spr}}

Numerically, cspr0.72c_{\mathrm{spr}} \approx 0.72.

Summary

Summary for average estimators:

EstimatorDrift(E,X)\operatorname{Drift}(E, X)Drift2(E,X)\operatorname{Drift}^2(E, X)1Drift2(E,X)\frac{1}{\operatorname{Drift}^2(E, X)}
Mean\operatorname{Mean}111111
Median\operatorname{Median}1.253\approx 1.253π21.571\frac{\pi}{2} \approx 1.5712π0.637\frac{2}{\pi} \approx 0.637
Center\operatorname{Center}1.023\approx 1.023π31.047\frac{\pi}{3} \approx 1.0473π0.955\frac{3}{\pi} \approx 0.955

The squared drift values indicate the sample size adjustment needed when switching estimators. For instance, switching from Mean\operatorname{Mean} to Median\operatorname{Median} while maintaining the same precision requires increasing the sample size by a factor of π21.571\frac{\pi}{2} \approx 1.571 (about 57% more observations). Similarly, switching from Mean\operatorname{Mean} to Center\operatorname{Center} requires only about 5% more observations.

The inverse squared drift (rightmost column) equals the classical statistical efficiency relative to the Mean\operatorname{Mean}. The Mean\operatorname{Mean} achieves optimal performance (unit efficiency) for the Additive\underline{\operatorname{Additive}} (Normal) distribution, as expected from classical theory. The Center\operatorname{Center} maintains 95.5% efficiency while offering greater robustness to outliers, making it an attractive alternative when some contamination is possible. The Median\operatorname{Median}, while most robust, operates at only 63.7% efficiency under purely Additive\underline{\operatorname{Additive}} (Normal) conditions.

Summary for dispersion estimators:

For the Additive\underline{\operatorname{Additive}} (Normal) distribution, the asymptotic drift values reveal the relative precision of different dispersion estimators:

EstimatorDrift(E,X)\operatorname{Drift}(E, X)Drift2(E,X)\operatorname{Drift}^2(E, X)1Drift2(E,X)\frac{1}{\operatorname{Drift}^2(E, X)}
StdDev\operatorname{StdDev}0.67\approx 0.670.45\approx 0.452.22\approx 2.22
MAD\operatorname{MAD}1.10\approx 1.101.22\approx 1.220.82\approx 0.82
Spread\operatorname{Spread}0.72\approx 0.720.52\approx 0.521.92\approx 1.92

The squared drift values indicate the sample size adjustment needed when switching estimators. For instance, switching from StdDev\operatorname{StdDev} to MAD\operatorname{MAD} while maintaining the same precision requires increasing the sample size by a factor of 1.220.452.711.\frac{22}{0}.45 \approx 2.71 (more than doubling the observations). Similarly, switching from StdDev\operatorname{StdDev} to Spread\operatorname{Spread} requires a factor of 0.520.451.160.\frac{52}{0}.45 \approx 1.16.

The StdDev\operatorname{StdDev} achieves optimal performance for the Additive\underline{\operatorname{Additive}} (Normal) distribution. The MAD\operatorname{MAD} requires about 2.7 times more data to match the StdDev\operatorname{StdDev} precision while offering greater robustness to outliers. The Spread\operatorname{Spread} requires about 1.16 times more data to match the StdDev\operatorname{StdDev} precision under purely Additive\underline{\operatorname{Additive}} (Normal) conditions while maintaining robustness.