mean: location parameter (center of the distribution), consistent with Center
stdDev: scale parameter (standard deviation), can be rescaled to Spread
Formation: the sum of many variables X1+X2+…+Xn under mild CLT (Central Limit Theorem) conditions (e.g., Lindeberg-Feller).
Origin: historically called Normal or Gaussian distribution after Carl Friedrich Gauss and others.
Rename Motivation: renamed to Additive to reflect its formation mechanism through addition.
Properties: symmetric, bell-shaped, characterized by central limit theorem convergence.
Applications: measurement errors, heights and weights in populations, test scores, temperature variations.
Characteristics: symmetric around the mean, light tails, finite variance.
Caution: no perfectly additive distributions exist in real data; all real-world measurements contain deviations. Traditional estimators like Mean and StdDev lack robustness to outliers; use them only when strong evidence supports approximate additivity with no extreme measurements.
Notes
The Additive (Normal) distribution has two parameters: the mean and the standard deviation, written as Additive(mean,stdDev).
Sampling (Box-Muller Transform)
The toolkit samples Additive values using the Box-Muller transform (see Box & Muller 1958), which converts two independent UniformFloat draws into standard normal values. Given U1,U2in[0,1):
Both Z0 and Z1 are independent standard normal values. The implementation uses only Z0 to maintain cross-language determinism.
Asymptotic Spread Value
Consider two independent draws X and Y from the Additive(mean,stdDev) distribution. The goal is to find the median of their absolute difference ∣X−Y∣. Define the difference D=X−Y. By linearity of expectation, E[D]=0. By independence, Var[D]=2⋅stdDev2. Thus D has distribution Additive(0,2⋅stdDev), and the problem reduces to finding the median of ∣D∣. The location parameter mean disappears, as expected, because absolute differences are invariant to shifts.
Let τ=2⋅stdDev, so that D∼Additive(0,τ). The random variable ∣D∣ then follows the Half-Additive (Folded Normal) distribution with scale τ. Its cumulative distribution function for z≥0 becomes
F∣D∣(z)=Pr(∣D∣≤z)=2Φ(τz)−1
where Φ denotes the standard Additive (Normal) CDF.
The median m is the point at which this cdf equals 21. Setting F∣D∣(m)=21 gives
2Φ(τm)−1=21⇒Φ(τm)=43
Applying the inverse cdf yields τm=z0.75. Substituting back τ=2⋅stdDev produces
Median(∣X−Y∣)=2⋅z0.75⋅stdDev
Define z0.75:=Φ−1(0.75)≈0.6744897502. Numerically, the median absolute difference is approximately 2⋅z0.75⋅stdDev≈0.9538725524⋅stdDev. This expression depends only on the scale parameter stdDev, not on the mean, reflecting the translation invariance of the problem.
Lemma: Average Estimator Drift Formula
For average estimators Tn with asymptotic standard deviation a⋅nstdDev around the mean μ, define RelSpread[Tn]:=Spread[X]Spread[Tn]. In the Additive (Normal) case, Spread[X]=2⋅z0.75⋅stdDev.
For any average estimator Tn with asymptotic standard deviation a⋅nstdDev around the mean μ, the drift calculation follows:
The spread of two independent estimates: Spread[Tn]=2⋅z0.75⋅a⋅nstdDev
The relative spread: RelSpread[Tn]=na
The asymptotic drift: Drift(T,X)=a
Asymptotic Mean Drift
For the sample mean Mean(x)=n1∑i=1nxi applied to samples from Additive(mean,stdDev), the sampling distribution of Mean is also additive with mean mean and standard deviation nstdDev.
Using the lemma with a=1 (since the standard deviation is nstdDev):
Drift(Mean,X)=1
Mean achieves unit drift under the Additive (Normal) distribution, serving as the natural baseline for comparison. Mean is the optimal estimator under the Additive (Normal) distribution: no other estimator achieves lower Drift.
Asymptotic Median Drift
For the sample median Median(x) applied to samples from Additive(mean,stdDev), the asymptotic sampling distribution of Median is approximately Additive (Normal) with mean mean and standard deviation 2π⋅nstdDev.
This result follows from the asymptotic theory of order statistics. For the median of a sample from a continuous distribution with density f and cumulative distribution F, the asymptotic variance is 4n[f(F−1(0.5))]21. For the Additive (Normal) distribution with standard deviation stdDev, the density at the median (which equals the mean) is stdDev2π1. Thus the asymptotic variance becomes π⋅stdDev2n2.
Using the lemma with a=2π:
Drift(Median,X)=2π
Numerically, 2π≈1.2533, so the median has approximately 25% higher drift than the mean under the Additive (Normal) distribution.
Asymptotic Center Drift
For the sample center Center(x)=1≤i≤j≤nMedian2xi+xj applied to samples from Additive(mean,stdDev), its asymptotic sampling distribution must be determined.
The center estimator computes all pairwise averages (including i=j) and takes their median. For the Additive (Normal) distribution, asymptotic theory shows that the center estimator is asymptotically Additive (Normal) with mean mean.
The exact asymptotic variance of the center estimator for the Additive (Normal) distribution is:
Var[Center(X1:n)]=3nπ⋅stdDev2
This gives an asymptotic standard deviation of:
StdDev[Center(X1:n)]=3π⋅nstdDev
Using the lemma with a=3π:
Drift(Center,X)=3π
Numerically, 3π≈1.0233, so the center estimator achieves a drift very close to 1 under the Additive (Normal) distribution, performing nearly as well as the mean while offering greater robustness to outliers.
Lemma: Dispersion Estimator Drift Formula
For dispersion estimators Tn with asymptotic center b⋅stdDev and standard deviation a⋅nstdDev, define RelSpread[Tn]:=b⋅stdDevSpread[Tn].
For any dispersion estimator Tn with asymptotic distribution Tn∼approxAdditive(b⋅stdDev,(a⋅stdDev)n2), the drift calculation follows:
The spread of two independent estimates: Spread[Tn]=2⋅z0.75⋅a⋅nstdDev
The relative spread: RelSpread[Tn]=2⋅z0.75⋅bna
The asymptotic drift: Drift(T,X)=2⋅z0.75⋅ba
Note: The 2 factor comes from the standard deviation of the difference D=T1−T2 of two independent estimates, and the z0.75 factor converts this standard deviation to the median absolute difference.
Asymptotic StdDev Drift
For the sample standard deviation StdDev(x)=n−11∑i=1n(xi−Mean(x))2 applied to samples from Additive(mean,stdDev), the sampling distribution of StdDev is approximately Additive (Normal) for large n with mean stdDev and standard deviation 2nstdDev.
For the median absolute deviation MAD(x)=Median(∣xi−Median(x)∣) applied to samples from Additive(mean,stdDev), the asymptotic distribution is approximately Additive (Normal).
For the Additive (Normal) distribution, the population MAD equals z0.75⋅stdDev. The asymptotic standard deviation of the sample MAD is:
StdDev[MAD(X1:n)]=cmad⋅nstdDev
where cmad≈0.78.
Applying the lemma with a=cmad and b=z0.75:
Spread[MAD(X1:n)]=2⋅z0.75⋅cmad⋅nstdDev
Since Center[MAD(X1:n)]≈z0.75⋅stdDev asymptotically:
For the sample spread Spread(x)=1≤i<j≤nMedian∣xi−xj∣ applied to samples from Additive(mean,stdDev), the asymptotic distribution is approximately Additive (Normal).
The spread estimator computes all pairwise absolute differences and takes their median. For the Additive (Normal) distribution, the population spread equals 2⋅z0.75⋅stdDev as derived in the Asymptotic Spread Value section.
The asymptotic standard deviation of the sample spread for the Additive (Normal) distribution is:
StdDev[Spread(X1:n)]=cspr⋅nstdDev
where cspr≈0.72.
Applying the lemma with a=cspr and b=2⋅z0.75:
Spread[Spread(X1:n)]=2⋅z0.75⋅cspr⋅nstdDev
Since Center[Spread(X1:n)]≈2⋅z0.75⋅stdDev asymptotically:
The squared drift values indicate the sample size adjustment needed when switching estimators. For instance, switching from Mean to Median while maintaining the same precision requires increasing the sample size by a factor of 2π≈1.571 (about 57% more observations). Similarly, switching from Mean to Center requires only about 5% more observations.
The inverse squared drift (rightmost column) equals the classical statistical efficiency relative to the Mean. The Mean achieves optimal performance (unit efficiency) for the Additive (Normal) distribution, as expected from classical theory. The Center maintains 95.5% efficiency while offering greater robustness to outliers, making it an attractive alternative when some contamination is possible. The Median, while most robust, operates at only 63.7% efficiency under purely Additive (Normal) conditions.
Summary for dispersion estimators:
For the Additive (Normal) distribution, the asymptotic drift values reveal the relative precision of different dispersion estimators:
Estimator
Drift(E,X)
Drift2(E,X)
Drift2(E,X)1
StdDev
≈0.67
≈0.45
≈2.22
MAD
≈1.10
≈1.22
≈0.82
Spread
≈0.72
≈0.52
≈1.92
The squared drift values indicate the sample size adjustment needed when switching estimators. For instance, switching from StdDev to MAD while maintaining the same precision requires increasing the sample size by a factor of 1.022.45≈2.71 (more than doubling the observations). Similarly, switching from StdDev to Spread requires a factor of 0.052.45≈1.16.
The StdDev achieves optimal performance for the Additive (Normal) distribution. The MAD requires about 2.7 times more data to match the StdDev precision while offering greater robustness to outliers. The Spread requires about 1.16 times more data to match the StdDev precision under purely Additive (Normal) conditions while maintaining robustness.