Studies
This section analyzes the estimators properties using mathematical proofs. Most proofs are adapted from various textbooks and research papers, but only essential references are provided.
Unlike the main part of the manual, the studies require knowledge of classic statistical methods. Well-known facts and commonly accepted notation are used without special introduction. The studies provide detailed analyses of estimator properties for practitioners interested in rigorous proofs and numerical simulation results.
Summary Estimator Properties
This section compares the toolkits robust estimators against traditional statistical methods to demonstrate their advantages across diverse conditions. While traditional estimators often work well under ideal conditions, the toolkits estimators maintain reliable performance across diverse real-world scenarios.
Average Estimators:
Mean (arithmetic average):
Median:
Center (Hodges-Lehmann estimator):
Dispersion Estimators:
Standard Deviation:
Median Absolute Deviation (around the median):
Spread (Shamos scale estimator):
Breakdown
Heavy-tailed distributions naturally produce extreme outliers that completely distort traditional estimators. A single extreme measurement from the distribution can make the sample mean arbitrarily large. Real-world data can also contain corrupted measurements from instrument failures, recording errors, or transmission problems. Both natural extremes and data corruption create the same challenge: extracting reliable information when some measurements are too influential.
The breakdown point (Huber 2009) is the fraction of a sample that can be replaced by arbitrarily large values without making an estimator arbitrarily large. The theoretical maximum is ; no estimator can guarantee reliable results when more than half the measurements are extreme or corrupted. In such cases, summary estimators are not applicable, and a more sophisticated approach is needed.
A breakdown point is rarely needed in practice, as more conservative values also cover practical needs. Additionally, a high breakdown point corresponds to low precision (information is lost by neglecting part of the data). The optimal practical breakdown point should be between (no robustness) and (low precision).
The and estimators achieve breakdown points, providing substantial protection against realistic contamination levels while maintaining good precision. Below is a comparison with traditional estimators.
Asymptotic breakdown points for average estimators:
| 0% | 50% | 29% |
Asymptotic breakdown points for dispersion estimators:
| 0% | 50% | 29% |
Drift
Drift measures estimator precision by quantifying how much estimates scatter across repeated samples. It is based on the of estimates and therefore has a breakdown point of approximately .
Drift is useful for comparing the precision of several estimators. To simplify the comparison, one of the estimators can be chosen as a baseline. A table of squared drift values, normalized by the baseline, shows the required sample size adjustment factor for switching from the baseline to another estimator. For example, if is the baseline and the rescaled drift square of is , this means that requires times more data than to achieve the same precision. See the From Statistical Efficiency to Drift section for details.
Squared Asymptotic Drift of Average Estimators (values are approximated):
| 1.0 | 1.571 | 1.047 | |
| 3.95 | 1.40 | 1.7 | |
| 1.88 | 1.88 | 1.69 | |
| 0.9 | 2.1 | ||
| 0.88 | 2.60 | 0.94 |
Rescaled to (sample size adjustment factors):
| 0.96 | 1.50 | 1.0 | |
| 2.32 | 0.82 | 1.0 | |
| 1.11 | 1.11 | 1.0 | |
| 0.43 | 1.0 | ||
| 0.936 | 2.77 | 1.0 |










Squared Asymptotic Drift of Dispersion Estimators (values are approximated):
| 0.45 | 1.22 | 0.52 | |
| 2.26 | 1.81 | ||
| 1.69 | 1.92 | 1.26 | |
| 3.5 | 4.4 | ||
| 0.18 | 0.90 | 0.43 |
Rescaled to (sample size adjustment factors):
| 0.87 | 2.35 | 1.0 | |
| 1.25 | 1.0 | ||
| 1.34 | 1.52 | 1.0 | |
| 0.80 | 1.0 | ||
| 0.42 | 2.09 | 1.0 |










Invariance
Invariance properties determine how estimators respond to data transformations. These properties are crucial for analysis design and interpretation:
- Location-invariant estimators are invariant to additive shifts:
- Scale-invariant estimators are invariant to positive rescaling: for
- Equivariant estimators change predictably with transformations, maintaining relative relationships
Choosing estimators with appropriate invariance properties ensures that results remain meaningful across different measurement scales, units, and data transformations. For example, when comparing datasets collected with different instruments or protocols, location-invariant estimators eliminate the need for data centering, while scale-invariant estimators eliminate the need for normalization.
Location-invariance: An estimator is location-invariant if adding a constant to the measurements leaves the result unchanged:
Location-equivariance: An estimator is location-equivariant if it shifts with the data:
Scale-invariance: An estimator is scale-invariant if multiplying by a positive constant leaves the result unchanged:
Scale-equivariance: An estimator is scale-equivariant if it scales proportionally with the data:
| Location | Scale | |
|---|---|---|
| Center | Equivariant | Equivariant |
| Spread | Invariant | Equivariant |
| RelSpread | – | Invariant |
| Shift | Invariant | Equivariant |
| Ratio | – | Invariant |
| AvgSpread | Invariant | Equivariant |
| Disparity | Invariant | Invariant |
Reframings
From Statistical Efficiency to Drift
Statistical efficiency measures estimator precision (Serfling 2009). When multiple estimators target the same quantity, efficiency determines which provides more reliable results.
Efficiency measures how tightly estimates cluster around the true value across repeated samples. For an estimator applied to samples from distribution , absolute efficiency is defined relative to the optimal estimator :
Relative efficiency compares two estimators by taking the ratio of their variances:
Under (Normal) distributions, this approach works well. The sample mean achieves optimal efficiency, while the median operates at roughly 64% efficiency.
However, this variance-based definition creates four critical limitations:
- Absolute efficiency requires knowing the optimal estimator, which is difficult to determine. For many distributions, deriving the minimum-variance unbiased estimator requires complex mathematical analysis. Without this reference point, absolute efficiency cannot be computed.
- Relative efficiency only compares estimator pairs, preventing systematic evaluation. This limits understanding of how multiple estimators perform relative to each other. Practitioners cannot rank estimators comprehensively or evaluate individual performance in isolation.
- The approach depends on variance calculations that break down when variance becomes infinite or when distributions have heavy tails. Many real-world distributions, such as those with power-law tails, exhibit infinite variance. When the variance is undefined, efficiency comparisons become impossible.
- Variance is not robust to outliers, which can corrupt efficiency calculations. A single extreme observation can greatly inflate variance estimates. This sensitivity can make efficient estimators look inefficient and vice versa.
The concept provides a robust alternative. Drift measures estimator precision using instead of variance, providing reliable comparisons across a wide range of distributions.
For an average estimator , random variable , and sample size :
This formula measures estimator variability compared to data variability. captures the median absolute difference between estimates across repeated samples. Multiplying by removes sample size dependency, making drift values comparable across different sample sizes. Dividing by creates a scale-free measure that provides consistent drift values across different distribution parameters and measurement units.
Dispersion estimators use a parallel formulation:
Here normalizes by the estimators typical value for fair comparison.
Drift offers four key advantages:
- For estimators with convergence rates, drift remains finite and comparable across distributions; for heavier tails, drift may diverge, flagging estimator instability.
- It provides absolute precision measures rather than only pairwise comparisons.
- The robust foundation resists outlier distortion that corrupts variance-based calculations.
- The normalization makes drift values comparable across different sample sizes, enabling direct comparison of estimator performance regardless of sample size.
Under (Normal) conditions, drift matches traditional efficiency. The sample mean achieves drift near 1.0; the median achieves drift around 1.25. This consistency validates drift as a proper generalization of efficiency that extends to realistic data conditions where traditional efficiency fails.
When switching from one estimator to another while maintaining the same precision, the required sample size adjustment follows:
This applies when estimator has lower drift than .
The ratio of squared drifts determines the data requirement change. If has drift 1.5 times higher than , then requires times more data to match s precision. Conversely, switching to a more precise estimator allows smaller sample sizes.
For asymptotic analysis, denotes the limiting value as . With a baseline estimator, rescaled drift values enable direct comparisons:
The standard drift definition assumes convergence rates typical under (Normal) conditions. For broader applicability, drift generalizes to:
The instability parameter adapts to estimator convergence rates. The toolkit uses throughout because this choice provides natural intuition and mental representation for the (Normal) distribution. Rather than introduce additional complexity through variable instability parameters, the fixed scaling offers practical convenience while maintaining theoretical rigor for the distribution classes most common in applications.
From Confidence Level to Misrate
Traditional statistics expresses uncertainty through confidence levels: 95% confidence interval, 99% confidence, 99.9% confidence. This convention emerged from early statistical practice when tables printed confidence intervals for common levels like 90%, 95%, and 99%.
The confidence level approach creates practical problems:
- Cognitive difficulty with high confidence. Distinguishing between 99.999% and 99.9999% confidence requires mental effort. The difference matters — one represents a 1-in-100,000 error rate, the other 1-in-1,000,000 — but the representation obscures this distinction.
- Asymmetric scale. The confidence level scale compresses near 100%, where most practical values cluster. Moving from 90% to 95% represents a 2× change in error rate, while moving from 99% to 99.9% represents a 10× change, despite similar visual spacing.
- Indirect interpretation. Practitioners care about error rates, not success rates. Whats the chance Im wrong? matters more than Whats the chance Im right? Confidence level forces mental subtraction to answer the natural question.
- Unclear defaults. Traditional practice offers no clear default confidence level. Different fields use different conventions (95%, 99%, 99.9%), creating inconsistency and requiring arbitrary choices.
The parameter provides a more natural representation. Misrate expresses the probability that computed bounds fail to contain the true value:
This simple inversion provides several advantages:
- Direct interpretation. means 1% chance of error or wrong 1 time in 100. means wrong 1 time in a million. No mental arithmetic required.
- Linear scale for practical values. (10%), (1%), (0.1%) form a natural sequence. Scientific notation handles extreme values cleanly: , , .
- Clear comparisons. versus immediately shows a 10× difference in error tolerance. 99.999% versus 99.9999% confidence obscures this same relationship.
- Pragmatic default. The toolkit recommends (one-in-a-thousand error rate) as a reasonable default for everyday analysis. For critical decisions where errors are costly, use (one-in-a-million).
The terminology shift from confidence level to misrate parallels other clarifying renames in this toolkit. Just as better describes the distributions formation than Normal, and better describes the estimators purpose than Hodges-Lehmann, better describes the quantity practitioners actually reason about: the probability of error.
Traditional confidence intervals become bounds in this framework, eliminating statistical jargon in favor of descriptive terminology. clearly indicates: it provides bounds on the shift, with a specified error rate. No background in classical statistics required to understand the concept.
From Mann-Whitney U-test to Pairwise Margin
The Mann-Whitney test (also known as the Wilcoxon rank-sum test) ranks among the most widely used non-parametric statistical tests, testing whether two independent samples come from the same distribution. Under (Normal) conditions, it achieves nearly the same precision as the Students -test, while maintaining reliability under diverse distributional conditions where the -test fails.
The test operates by comparing all pairs of measurements between the two samples. Given samples and , the Mann-Whitney statistic counts how many pairs satisfy :
If the samples come from the same distribution, should be near (roughly half the pairs favor , half favor ). Large deviations from suggest the distributions differ.
The test answers: Could this value arise by chance if the samples were truly equivalent? The -value quantifies this probability. If , traditional practice declares the difference statistically significant.
This approach creates several problems for practitioners:
- Binary thinking. The test produces a yes/no answer: reject or fail to reject the null hypothesis. Practitioners typically want to know the magnitude of difference, not just whether one exists.
- Arbitrary thresholds. The 0.05 threshold has no universal justification, yet it dominates practice and creates a false dichotomy between and .
- Hypothesis-centric framework. The test assumes a null hypothesis of no difference and evaluates evidence against it. Real questions rarely concern exact equality; practitioners want to know how different? rather than different or not?
- Inverted logic. The natural question is what shifts are consistent with my data? The test answers is this specific shift (zero) consistent with my data?
The toolkit inverts this framework. Instead of testing whether a hypothesized shift is plausible, we compute which shifts are plausible given the data. This inversion transforms hypothesis testing into bounds estimation.
The mathematical foundation remains the same. The distribution of pairwise comparisons under random sampling determines which order statistics of pairwise differences form reliable bounds. The Mann-Whitney statistic measures pairwise comparisons (). The estimator uses pairwise differences (). These quantities are mathematically related: a pairwise difference is positive exactly when . The toolkit renames this comparison count from to , clarifying its purpose: measuring how often one sample dominates the other in pairwise comparisons.
The distribution of determines which order statistics form reliable bounds. Define the margin function:
This function computes how many extreme pairwise differences could occur by chance with probability , based on the distribution of pairwise comparisons.
The function requires knowing the distribution of pairwise comparisons under sampling. Two computational approaches exist:
- Exact computation (Löfflers algorithm, 1982). Uses a recurrence relation to compute the exact distribution of pairwise comparisons for small samples. Practical for combined sample sizes up to several hundred.
- Approximation (Edgeworth expansion, 1955). Refines the normal approximation with correction terms based on higher moments of the distribution. Provides accurate results for large samples where exact computation becomes impractical.
The toolkit automatically selects the appropriate method based on sample sizes, ensuring both accuracy and computational efficiency.
This approach naturally complements and :
- uses the median of pairwise averages
- uses the median of pairwise differences
- uses the median of pairwise differences
- uses order statistics of the same pairwise differences
All procedures build on pairwise operations. This structural consistency reflects the mathematical unity underlying robust statistics: pairwise operations provide natural robustness while maintaining computational feasibility and statistical efficiency.
The inversion from hypothesis testing to bounds estimation represents a philosophical shift in statistical practice. Traditional methods ask should I believe this specific hypothesis? Pragmatic methods ask what should I believe, given this data? Bounds provide actionable answers: they tell practitioners which values are plausible, enabling informed decisions without arbitrary significance thresholds.
Traditional Mann-Whitney implementations apply tie correction when samples contain repeated values. This correction modifies variance calculations to account for tied observations, changing -values and confidence intervals in ways that depend on measurement precision. The toolkit deliberately omits tie correction. Continuous distributions produce theoretically distinct values; observed ties result from finite measurement precision and digital representation. When measurements appear identical, this reflects rounding of underlying continuous variation, not true equality in the measured quantity. Treating ties as artifacts of discretization rather than distributional features simplifies computation while maintaining accuracy. The exact and approximate methods compute comparison distributions without requiring adjustments for tied values, eliminating a source of complexity and potential inconsistency in statistical practice.
Historical Development
The mathematical foundations emerged through decades of refinement. Mann and Whitney (1947) established the distribution of pairwise comparisons under random sampling, creating the theoretical basis for comparing samples through rank-based methods. Their work demonstrated that comparison counts follow predictable patterns regardless of the underlying population distributions.
The original computational approaches suffered from severe limitations. Mann and Whitney proposed a slow exact method requiring exponential resources and a normal approximation that proved grossly inaccurate for practical use. The approximation works reasonably in distribution centers but fails catastrophically in the tails where practitioners most need accuracy. For moderate sample sizes, approximation errors can exceed factors of .
Fix and Hodges (1955) addressed the approximation problem through higher-order corrections. Their expansion adds terms based on the distributions actual moments rather than assuming perfect normality. This refinement reduces tail probability errors from orders of magnitude to roughly 1%, making approximation practical for large samples where exact computation becomes infeasible.
Löffler (1982) solved the exact computation problem through algorithmic innovation. The naive recurrence requires quadratic memory— infeasible for samples beyond a few dozen measurements. Löffler discovered a reformulation that reduces memory to linear scale, making exact computation practical for combined sample sizes up to several hundred.
Despite these advances, most statistical software continues using the 1947 approximation. The computational literature contains the solutions, but software implementations lag decades behind theoretical developments. This toolkit implements both the exact method for small samples and the refined approximation for large samples, automatically selecting the appropriate approach based on sample sizes.
The shift from hypothesis testing to bounds estimation requires no new mathematics. The same comparison distributions that enable hypothesis tests also determine which order statistics form reliable bounds. Traditional applications ask is zero plausible? and answer yes or no. This toolkit asks which values are plausible? and answers with an interval. The perspective inverts while the mathematical foundation remains identical.
Notes
On Bootstrap for Center Bounds
A natural question arises: can bootstrap resampling improve coverage for asymmetric distributions where the weak symmetry assumption fails?
The idea is appealing. The signed-rank approach computes bounds from order statistics of Walsh averages using a margin derived from the Wilcoxon distribution, which assumes symmetric deviations from the center. Bootstrap makes no symmetry assumption: resample the data with replacement, compute on each resample, and take quantiles of the bootstrap distribution as bounds. This should yield valid bounds regardless of distributional shape.
This manual deliberately does not provide a bootstrap-based alternative to . The reasons are both computational and statistical.
Computational cost
computes bounds in time: a single pass through the Walsh averages guided by the signed-rank margin. No resampling, no iteration.
A bootstrap version requires resamples (typically for stable tail quantiles), each computing on the resample. itself costs via the fast selection algorithm on the implicit pairwise matrix. The total cost becomes per call — roughly slower than the signed-rank approach.
For , each call to operates on Walsh averages. The bootstrap recomputes this times. The computation is not deep — it is merely wasteful. For , there are Walsh averages per resample, and resamples produce selection operations per bounds call. In a simulation study that evaluates bounds across many samples, this cost becomes prohibitive.
Statistical quality
Bootstrap bounds are nominal, not exact. The percentile method has well-documented undercoverage for small samples: requesting 95% confidence () typically yields 85–92% actual coverage for . This is inherent to the bootstrap percentile method — the quantile estimates from resamples are biased toward the sample and underrepresent tail behavior. Refined methods (BCa, bootstrap-) partially address this but add complexity and still provide only asymptotic guarantees.
Meanwhile, provides exact distribution-free coverage under symmetry. For requesting , the signed-rank method delivers exactly misrate. A bootstrap method, requesting the same , typically delivers – misrate. The exact method is simultaneously faster and more accurate.
Behavior under asymmetry
Under asymmetric distributions, the signed-rank margin is no longer calibrated: the Wilcoxon distribution assumes symmetric deviations, and asymmetry shifts the actual distribution of comparison counts.
However, the coverage degradation is gradual, not catastrophic. Mild asymmetry produces mild coverage drift. The bounds remain meaningful — they still bracket the pseudomedian using order statistics of the Walsh averages — but the actual misrate differs from the requested value.
This is the same situation as bootstrap, which also provides only approximate coverage. The practical difference is that the signed-rank approach achieves this approximate coverage in time, while bootstrap achieves comparable approximate coverage in time.
Why not both?
One might argue for providing both methods: the signed-rank approach as default, and a bootstrap variant for cases where symmetry is severely violated.
This creates a misleading choice. If the bootstrap method offered substantially better coverage under asymmetry, the complexity would be justified. But for the distributions practitioners encounter (, , and other moderate asymmetries), the coverage difference between the two approaches is small relative to the cost difference. For extreme asymmetries where the signed-rank coverage genuinely breaks down, the sign test provides an alternative foundation for median bounds (see the study on misrate efficiency of MedianBounds), but its efficiency convergence makes it impractical for moderate sample sizes.
The toolkit therefore provides as the single bounds estimator. The weak symmetry assumption means the method performs well under approximate symmetry and degrades gracefully under moderate asymmetry. There is no useful middle ground that justifies a computational penalty for marginally different approximate coverage.
On Misrate Efficiency of MedianBounds
This study analyzes , a bounds estimator for the population median based on the sign test, and explains why pragmastat omits it in favor of .
Definition
where is the largest integer satisfying and . The interval brackets the population median using order statistics, with controlling the probability that the true median falls outside the bounds.
requires no symmetry assumption — only weak continuity — making it applicable to arbitrarily skewed distributions. This is its principal advantage over , which assumes weak symmetry.
Sign test foundation
The method is equivalent to inverting the sign test. Under weak continuity, each observation independently falls above or below the true median with probability . The number of observations below the median follows , and the order statistics and form a confidence interval whose coverage is determined exactly by the binomial CDF.
Because the binomial CDF is a step function, the achievable misrate values form a discrete set. The algorithm rounds down to the nearest achievable level, inevitably wasting part of the requested misrate budget. This study derives the resulting efficiency loss and its convergence rate.
Achievable misrate levels
The achievable misrate values for sample size are:
The algorithm selects the largest satisfying . The efficiency measures how much of the requested budget is used. Efficiency means the bounds are as tight as the misrate allows; means half the budget is wasted, producing unnecessarily wide bounds.
Spacing between consecutive levels
The gap between consecutive achievable misrates is:
For a target misrate , the relevant index satisfies . By the normal approximation to the binomial, , the binomial CDF near this index changes by approximately:
where is the corresponding standard normal quantile and is the standard normal density. This spacing governs how coarsely the achievable misrates are distributed near the target.
Expected efficiency
The requested misrate falls at a uniformly random position within a gap of width . On average, the algorithm wastes , giving expected efficiency:
Define the misrate-dependent constant:
Then the expected efficiency has the form:
The convergence rate is : efficiency improves as the square root of sample size.
Values of
The constant increases for smaller misrates, meaning tighter error tolerances require proportionally larger samples for efficient bounds:
For and : . Achieving efficiency on average requires . For this gives ; for this gives .
Comparison with CenterBounds
uses the signed-rank statistic with range . Under the null hypothesis, has variance . The CDF spacing at the relevant quantile is:
The expected efficiency for is therefore:
This converges at rate — three polynomial orders faster than . The difference arises because the signed-rank distribution has discrete levels compared to the binomials levels, providing fundamentally finer resolution.
Why pragmastat omits MedianBounds
The efficiency loss of is not an implementation artifact. It reflects a structural limitation of the sign test: using only the signs of discards magnitude information, leaving only binary observations to determine coverage. The signed-rank test used by exploits both signs and ranks, producing comparison outcomes and correspondingly finer misrate resolution.
For applications requiring tight misrate control on the median, large samples () are needed to ensure efficient use of the misrate budget. For smaller samples, the bounds remain valid but conservative: the actual misrate is guaranteed to not exceed the requested value, even though it may be substantially below it.
with its convergence achieves near-continuous misrate control even for moderate , at the cost of requiring weak symmetry. For the distributions practitioners typically encounter, this tradeoff favors as the single bounds estimator in the toolkit. When symmetry is severely violated, the coverage drift of is gradual — mild asymmetry produces mild drift — making it a robust default without the efficiency penalty inherent to the sign test approach.
Additive (Normal) Distribution
The (Normal) distribution has two parameters: the mean and the standard deviation, written as .
Asymptotic Spread Value
Consider two independent draws and from the distribution. The goal is to find the median of their absolute difference . Define the difference . By linearity of expectation, . By independence, . Thus has distribution , and the problem reduces to finding the median of . The location parameter disappears, as expected, because absolute differences are invariant to shifts.
Let , so that . The random variable then follows the Half- (Folded Normal) distribution with scale . Its cumulative distribution function for becomes
where denotes the standard (Normal) CDF.
The median is the point at which this cdf equals . Setting gives
Applying the inverse cdf yields . Substituting back produces
Define . Numerically, the median absolute difference is approximately . This expression depends only on the scale parameter , not on the mean, reflecting the translation invariance of the problem.
Lemma: Average Estimator Drift Formula
For average estimators with asymptotic standard deviation around the mean , define . In the (Normal) case, .
For any average estimator with asymptotic standard deviation around the mean , the drift calculation follows:
- The spread of two independent estimates:
- The relative spread:
- The asymptotic drift:
Asymptotic Mean Drift
For the sample mean applied to samples from , the sampling distribution of is also additive with mean and standard deviation .
Using the lemma with (since the standard deviation is ):
achieves unit drift under the (Normal) distribution, serving as the natural baseline for comparison. is the optimal estimator under the (Normal) distribution: no other estimator achieves lower .
Asymptotic Median Drift
For the sample median applied to samples from , the asymptotic sampling distribution of is approximately (Normal) with mean and standard deviation .
This result follows from the asymptotic theory of order statistics. For the median of a sample from a continuous distribution with density and cumulative distribution , the asymptotic variance is . For the (Normal) distribution with standard deviation , the density at the median (which equals the mean) is . Thus the asymptotic variance becomes .
Using the lemma with :
Numerically, , so the median has approximately 25% higher drift than the mean under the (Normal) distribution.
Asymptotic Center Drift
For the sample center applied to samples from , its asymptotic sampling distribution must be determined.
The center estimator computes all pairwise averages (including ) and takes their median. For the (Normal) distribution, asymptotic theory shows that the center estimator is asymptotically (Normal) with mean .
The exact asymptotic variance of the center estimator for the (Normal) distribution is:
This gives an asymptotic standard deviation of:
Using the lemma with :
Numerically, , so the center estimator achieves a drift very close to 1 under the (Normal) distribution, performing nearly as well as the mean while offering greater robustness to outliers.
Lemma: Dispersion Estimator Drift Formula
For dispersion estimators with asymptotic center and standard deviation , define .
For any dispersion estimator with asymptotic distribution , the drift calculation follows:
- The spread of two independent estimates:
- The relative spread:
- The asymptotic drift:
Note: The factor comes from the standard deviation of the difference of two independent estimates, and the factor converts this standard deviation to the median absolute difference.
Asymptotic StdDev Drift
For the sample standard deviation applied to samples from , the sampling distribution of is approximately (Normal) for large with mean and standard deviation .
Applying the lemma with and :
For the dispersion drift, we use the relative spread formula:
Since asymptotically:
Therefore:
Numerically, .
Asymptotic MAD Drift
For the median absolute deviation applied to samples from , the asymptotic distribution is approximately (Normal).
For the (Normal) distribution, the population MAD equals . The asymptotic standard deviation of the sample MAD is:
where .
Applying the lemma with and :
Since asymptotically:
Therefore:
Numerically, .
Asymptotic Spread Drift
For the sample spread applied to samples from , the asymptotic distribution is approximately (Normal).
The spread estimator computes all pairwise absolute differences and takes their median. For the (Normal) distribution, the population spread equals as derived in the Asymptotic Spread Value section.
The asymptotic standard deviation of the sample spread for the (Normal) distribution is:
where .
Applying the lemma with and :
Since asymptotically:
Therefore:
Numerically, .
Summary
Summary for average estimators:
| Estimator | |||
|---|---|---|---|
The squared drift values indicate the sample size adjustment needed when switching estimators. For instance, switching from to while maintaining the same precision requires increasing the sample size by a factor of (about 57% more observations). Similarly, switching from to requires only about 5% more observations.
The inverse squared drift (rightmost column) equals the classical statistical efficiency relative to the . The achieves optimal performance (unit efficiency) for the (Normal) distribution, as expected from classical theory. The maintains 95.5% efficiency while offering greater robustness to outliers, making it an attractive alternative when some contamination is possible. The , while most robust, operates at only 63.7% efficiency under purely (Normal) conditions.
Summary for dispersion estimators:
For the (Normal) distribution, the asymptotic drift values reveal the relative precision of different dispersion estimators:
| Estimator | |||
|---|---|---|---|
The squared drift values indicate the sample size adjustment needed when switching estimators. For instance, switching from to while maintaining the same precision requires increasing the sample size by a factor of (more than doubling the observations). Similarly, switching from to requires a factor of .
The achieves optimal performance for the (Normal) distribution. The requires about 2.7 times more data to match the precision while offering greater robustness to outliers. The requires about 1.16 times more data to match the precision under purely (Normal) conditions while maintaining robustness.