Additive

Additive(mean,stdDev)\underline{\operatorname{Additive}}(\mathrm{mean}, \mathrm{stdDev})
  • mean\mathrm{mean}: location parameter (center of the distribution), consistent with Center\operatorname{Center}
  • stdDev\mathrm{stdDev}: scale parameter (standard deviation), can be rescaled to Spread\operatorname{Spread}

  • Formation: the sum of many variables X1+X2++XnX_1 + X_2 + \ldots + X_n under mild CLT (Central Limit Theorem) conditions (e.g., Lindeberg-Feller).
  • Origin: historically called Normal or Gaussian distribution after Carl Friedrich Gauss and others.
  • Rename Motivation: renamed to Additive\underline{\operatorname{Additive}} to reflect its formation mechanism through addition.
  • Properties: symmetric, bell-shaped, characterized by central limit theorem convergence.
  • Applications: measurement errors, heights and weights in populations, test scores, temperature variations.
  • Characteristics: symmetric around the mean, light tails, finite variance.
  • Caution: no perfectly additive distributions exist in real data; all real-world measurements contain deviations. Traditional estimators like Mean\operatorname{Mean} and StdDev\operatorname{StdDev} lack robustness to outliers; use them only when strong evidence supports approximate additivity with no extreme measurements.

Notes

The Additive\underline{\operatorname{Additive}} (Normal) distribution has two parameters: the mean and the standard deviation, written as Additive(mean,stdDev)\underline{\operatorname{Additive}}(\mathrm{mean}, \mathrm{stdDev}).

Sampling (Box-Muller Transform)

The toolkit samples Additive\underline{\operatorname{Additive}} values using the Box-Muller transform (see Box & Muller 1958), which converts two independent UniformFloat\operatorname{UniformFloat} draws into standard normal values. Given U1,U2in[0,1)U_1, U_2 in [0, 1):

Z0=2ln(U1)cos(2πU2)Z1=2ln(U1)sin(2πU2)\begin{aligned} Z_0 &= \sqrt{-2 \ln(U_1)} \cos(2 \pi U_2) \\ Z_1 &= \sqrt{-2 \ln(U_1)} \sin(2 \pi U_2) \end{aligned}

Both Z0Z_0 and Z1Z_1 are independent standard normal values. The implementation uses only Z0Z_0 to maintain cross-language determinism.

Asymptotic Spread Value

Consider two independent draws XX and YY from the Additive(mean,stdDev)\underline{\operatorname{Additive}}(\mathrm{mean}, \mathrm{stdDev}) distribution. The goal is to find the median of their absolute difference XY\lvert X-Y \rvert. Define the difference D=XYD = X - Y. By linearity of expectation, E[D]=0\mathbb{E}[D] = 0. By independence, Var[D]=2stdDev2\operatorname{Var}[D] = 2 \cdot \mathrm{stdDev}^2. Thus DD has distribution Additive(0,2stdDev)\underline{\operatorname{Additive}}(0, \sqrt{2} \cdot \mathrm{stdDev}), and the problem reduces to finding the median of D\lvert D \rvert. The location parameter mean\mathrm{mean} disappears, as expected, because absolute differences are invariant to shifts.

Let τ=2stdDev\tau=\sqrt{2} \cdot \mathrm{stdDev}, so that DAdditive(0,τ)D \sim \underline{\operatorname{Additive}}(0, \tau). The random variable D\lvert D \rvert then follows the Half-Additive\underline{\operatorname{Additive}} (Folded Normal) distribution with scale τ\tau. Its cumulative distribution function for z0z \geq 0 becomes

FD(z)=Pr(Dz)=2Φ(zτ)1F_{\lvert D \rvert}(z) = \Pr(\lvert D \rvert \leq z) = 2 \Phi \left(\frac{z}{\tau}\right) - 1

where Φ\Phi denotes the standard Additive\underline{\operatorname{Additive}} (Normal) CDF.

The median mm is the point at which this cdf equals 12\frac{1}{2}. Setting FD(m)=12F_{\lvert D \rvert}(m)=\frac{1}{2} gives

2Φ(mτ)1=12Φ(mτ)=342 \Phi \left(\frac{m}{\tau}\right) - 1 = \frac{1}{2} \Rightarrow \Phi \left(\frac{m}{\tau}\right) = \frac{3}{4}

Applying the inverse cdf yields mτ=z0.75\frac{m}{\tau} = z_{0.75}. Substituting back τ=2stdDev\tau = \sqrt{2} \cdot \mathrm{stdDev} produces

Median(XY)=2z0.75stdDev\operatorname{Median}(\lvert X-Y \rvert) = \sqrt{2} \cdot z_{0.75} \cdot \mathrm{stdDev}

Define z0.75:=Φ1(0.75)0.6744897502z_{0.75} := \Phi^{-1}(0.75) \approx 0.6744897502. Numerically, the median absolute difference is approximately 2z0.75stdDev0.9538725524stdDev\sqrt{2} \cdot z_{0.75} \cdot \mathrm{stdDev} \approx 0.9538725524 \cdot \mathrm{stdDev}. This expression depends only on the scale parameter stdDev\mathrm{stdDev}, not on the mean, reflecting the translation invariance of the problem.

Lemma: Average Estimator Drift Formula

For average estimators TnT_n with asymptotic standard deviation astdDevna \cdot \frac{\mathrm{stdDev}}{\sqrt{n}} around the mean μ\mu, define RelSpread[Tn]:=Spread[Tn]Spread[X]\operatorname{RelSpread}[T_n] := \frac{\operatorname{Spread}[T_n]}{\operatorname{Spread}[X]}. In the Additive\underline{\operatorname{Additive}} (Normal) case, Spread[X]=2z0.75stdDev\operatorname{Spread}[X] = \sqrt{2} \cdot z_{0.75} \cdot \mathrm{stdDev}.

For any average estimator TnT_n with asymptotic standard deviation astdDevna \cdot \frac{\mathrm{stdDev}}{\sqrt{n}} around the mean μ\mu, the drift calculation follows:

  • The spread of two independent estimates: Spread[Tn]=2z0.75astdDevn\operatorname{Spread}[T_n] = \sqrt{2} \cdot z_{0.75} \cdot a \cdot \frac{\mathrm{stdDev}}{\sqrt{n}}
  • The relative spread: RelSpread[Tn]=an\operatorname{RelSpread}[T_n] = \frac{a}{\sqrt{n}}
  • The asymptotic drift: Drift(T,X)=a\operatorname{Drift}(T, X) = a

Asymptotic Mean Drift

For the sample mean Mean(x)=1ni=1nxi\operatorname{Mean}(\mathbf{x}) = \frac{1}{n} \sum_{i=1}^n x_i applied to samples from Additive(mean,stdDev)\underline{\operatorname{Additive}}(\mathrm{mean}, \mathrm{stdDev}), the sampling distribution of Mean\operatorname{Mean} is also additive with mean mean\mathrm{mean} and standard deviation stdDevn\frac{\mathrm{stdDev}}{\sqrt{n}}.

Using the lemma with a=1a = 1 (since the standard deviation is stdDevn\frac{\mathrm{stdDev}}{\sqrt{n}}):

Drift(Mean,X)=1\operatorname{Drift}(\operatorname{Mean}, X) = 1

Mean\operatorname{Mean} achieves unit drift under the Additive\underline{\operatorname{Additive}} (Normal) distribution, serving as the natural baseline for comparison. Mean\operatorname{Mean} is the optimal estimator under the Additive\underline{\operatorname{Additive}} (Normal) distribution: no other estimator achieves lower Drift\operatorname{Drift}.

Asymptotic Median Drift

For the sample median Median(x)\operatorname{Median}(\mathbf{x}) applied to samples from Additive(mean,stdDev)\underline{\operatorname{Additive}}(\mathrm{mean}, \mathrm{stdDev}), the asymptotic sampling distribution of Median\operatorname{Median} is approximately Additive\underline{\operatorname{Additive}} (Normal) with mean mean\mathrm{mean} and standard deviation π2stdDevn\sqrt{\frac{\pi}{2}} \cdot \frac{\mathrm{stdDev}}{\sqrt{n}}.

This result follows from the asymptotic theory of order statistics. For the median of a sample from a continuous distribution with density ff and cumulative distribution FF, the asymptotic variance is 14n[f(F1(0.5))]2\frac{1}{4n[f(F^{-1}(0.5))]^2}. For the Additive\underline{\operatorname{Additive}} (Normal) distribution with standard deviation stdDev\mathrm{stdDev}, the density at the median (which equals the mean) is 1stdDev2π\frac{1}{\mathrm{stdDev} \sqrt{2 \pi}}. Thus the asymptotic variance becomes πstdDev22n\pi \cdot \mathrm{stdDev}^\frac{2}{2n}.

Using the lemma with a=π2a = \sqrt{\frac{\pi}{2}}:

Drift(Median,X)=π2\operatorname{Drift}(\operatorname{Median}, X) = \sqrt{\frac{\pi}{2}}

Numerically, π21.2533\sqrt{\frac{\pi}{2}} \approx 1.2533, so the median has approximately 25% higher drift than the mean under the Additive\underline{\operatorname{Additive}} (Normal) distribution.

Asymptotic Center Drift

For the sample center Center(x)=Median1ijnxi+xj2\operatorname{Center}(\mathbf{x}) = \underset{1 \leq i \leq j \leq n}{\operatorname{Median}} \frac{x_i + x_j}{2} applied to samples from Additive(mean,stdDev)\underline{\operatorname{Additive}}(\mathrm{mean}, \mathrm{stdDev}), its asymptotic sampling distribution must be determined.

The center estimator computes all pairwise averages (including i=ji=j) and takes their median. For the Additive\underline{\operatorname{Additive}} (Normal) distribution, asymptotic theory shows that the center estimator is asymptotically Additive\underline{\operatorname{Additive}} (Normal) with mean mean\mathrm{mean}.

The exact asymptotic variance of the center estimator for the Additive\underline{\operatorname{Additive}} (Normal) distribution is:

Var[Center(X1:n)]=πstdDev23n\operatorname{Var}[\operatorname{Center}(X_{1:n})] = \frac{\pi \cdot \mathrm{stdDev}^2}{3n}

This gives an asymptotic standard deviation of:

StdDev[Center(X1:n)]=π3stdDevn\operatorname{StdDev}[\operatorname{Center}(X_{1:n})] = \sqrt{\frac{\pi}{3}} \cdot \frac{\mathrm{stdDev}}{\sqrt{n}}

Using the lemma with a=π3a = \sqrt{\frac{\pi}{3}}:

Drift(Center,X)=π3\operatorname{Drift}(\operatorname{Center}, X) = \sqrt{\frac{\pi}{3}}

Numerically, π31.0233\sqrt{\frac{\pi}{3}} \approx 1.0233, so the center estimator achieves a drift very close to 1 under the Additive\underline{\operatorname{Additive}} (Normal) distribution, performing nearly as well as the mean while offering greater robustness to outliers.

Lemma: Dispersion Estimator Drift Formula

For dispersion estimators TnT_n with asymptotic center bstdDevb \cdot \mathrm{stdDev} and standard deviation astdDevna \cdot \frac{\mathrm{stdDev}}{\sqrt{n}}, define RelSpread[Tn]:=Spread[Tn]bstdDev\operatorname{RelSpread}[T_n] := \frac{\operatorname{Spread}[T_n]}{b \cdot \mathrm{stdDev}}.

For any dispersion estimator TnT_n with asymptotic distribution TnapproxAdditive(bstdDev,(astdDev)2n)T_n \sim\text{approx} \underline{\operatorname{Additive}}(b \cdot \mathrm{stdDev}, (a \cdot \mathrm{stdDev})^\frac{2}{n}), the drift calculation follows:

  • The spread of two independent estimates: Spread[Tn]=2z0.75astdDevn\operatorname{Spread}[T_n] = \sqrt{2} \cdot z_{0.75} \cdot a \cdot \frac{\mathrm{stdDev}}{\sqrt{n}}
  • The relative spread: RelSpread[Tn]=2z0.75abn\operatorname{RelSpread}[T_n] = \sqrt{2} \cdot z_{0.75} \cdot \frac{a}{b \sqrt{n}}
  • The asymptotic drift: Drift(T,X)=2z0.75ab\operatorname{Drift}(T, X) = \sqrt{2} \cdot z_{0.75} \cdot \frac{a}{b}

Note: The 2\sqrt{2} factor comes from the standard deviation of the difference D=T1T2D = T_1 - T_2 of two independent estimates, and the z0.75z_{0.75} factor converts this standard deviation to the median absolute difference.

Asymptotic StdDev Drift

For the sample standard deviation StdDev(x)=1n1i=1n(xiMean(x))2\operatorname{StdDev}(\mathbf{x}) = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \operatorname{Mean}(\mathbf{x}))^2} applied to samples from Additive(mean,stdDev)\underline{\operatorname{Additive}}(\mathrm{mean}, \mathrm{stdDev}), the sampling distribution of StdDev\operatorname{StdDev} is approximately Additive\underline{\operatorname{Additive}} (Normal) for large nn with mean stdDev\mathrm{stdDev} and standard deviation stdDev2n\frac{\mathrm{stdDev}}{\sqrt{2n}}.

Applying the lemma with a=12a = \frac{1}{\sqrt{2}} and b=1b = 1:

Spread[StdDev(X1:n)]=2z0.7512stdDevn=z0.75stdDevn\operatorname{Spread}[\operatorname{StdDev}(X_{1:n})] = \sqrt{2} \cdot z_{0.75} \cdot \frac{1}{\sqrt{2}} \cdot \frac{\mathrm{stdDev}}{\sqrt{n}} = z_{0.75} \cdot \frac{\mathrm{stdDev}}{\sqrt{n}}

For the dispersion drift, we use the relative spread formula:

RelSpread[StdDev(X1:n)]=Spread[StdDev(X1:n)]Center[StdDev(X1:n)]\operatorname{RelSpread}[\operatorname{StdDev}(X_{1:n})] = \frac{\operatorname{Spread}[\operatorname{StdDev}(X_{1:n})]}{\operatorname{Center}[\operatorname{StdDev}(X_{1:n})]}

Since Center[StdDev(X1:n)]stdDev\operatorname{Center}[\operatorname{StdDev}(X_{1:n})] \approx \mathrm{stdDev} asymptotically:

RelSpread[StdDev(X1:n)]=z0.75stdDevnstdDev=z0.75n\operatorname{RelSpread}[\operatorname{StdDev}(X_{1:n})] = \frac{z_{0.75} \cdot \frac{\mathrm{stdDev}}{\sqrt{n}}}{\mathrm{stdDev}} = \frac{z_{0.75}}{\sqrt{n}}

Therefore:

Drift(StdDev,X)=limnnRelSpread[StdDev(X1:n)]=z0.75\operatorname{Drift}(\operatorname{StdDev}, X) = lim_{n \to \infty} \sqrt{n} \cdot \operatorname{RelSpread}[\operatorname{StdDev}(X_{1:n})] = z_{0.75}

Numerically, z0.750.67449z_{0.75} \approx 0.67449.

Asymptotic MAD Drift

For the median absolute deviation MAD(x)=Median(xiMedian(x))\operatorname{MAD}(\mathbf{x}) = \operatorname{Median}(\lvert x_i - \operatorname{Median}(\mathbf{x}) \rvert) applied to samples from Additive(mean,stdDev)\underline{\operatorname{Additive}}(\mathrm{mean}, \mathrm{stdDev}), the asymptotic distribution is approximately Additive\underline{\operatorname{Additive}} (Normal).

For the Additive\underline{\operatorname{Additive}} (Normal) distribution, the population MAD equals z0.75stdDevz_{0.75} \cdot \mathrm{stdDev}. The asymptotic standard deviation of the sample MAD is:

StdDev[MAD(X1:n)]=cmadstdDevn\operatorname{StdDev}[\operatorname{MAD}(X_{1:n})] = c_{\mathrm{mad}} \cdot \frac{\mathrm{stdDev}}{\sqrt{n}}

where cmad0.78c_{\mathrm{mad}} \approx 0.78.

Applying the lemma with a=cmada = c_{\mathrm{mad}} and b=z0.75b = z_{0.75}:

Spread[MAD(X1:n)]=2z0.75cmadstdDevn\operatorname{Spread}[\operatorname{MAD}(X_{1:n})] = \sqrt{2} \cdot z_{0.75} \cdot c_{\mathrm{mad}} \cdot \frac{\mathrm{stdDev}}{\sqrt{n}}

Since Center[MAD(X1:n)]z0.75stdDev\operatorname{Center}[\operatorname{MAD}(X_{1:n})] \approx z_{0.75} \cdot \mathrm{stdDev} asymptotically:

RelSpread[MAD(X1:n)]=2z0.75cmadstdDevnz0.75stdDev=2cmadn\operatorname{RelSpread}[\operatorname{MAD}(X_{1:n})] = \frac{\sqrt{2} \cdot z_{0.75} \cdot c_{\mathrm{mad}} \cdot \frac{\mathrm{stdDev}}{\sqrt{n}}}{z_{0.75} \cdot \mathrm{stdDev}} = \frac{\sqrt{2} \cdot c_{\mathrm{mad}}}{\sqrt{n}}

Therefore:

Drift(MAD,X)=limnnRelSpread[MAD(X1:n)]=2cmad\operatorname{Drift}(\operatorname{MAD}, X) = lim_{n \to \infty} \sqrt{n} \cdot \operatorname{RelSpread}[\operatorname{MAD}(X_{1:n})] = \sqrt{2} \cdot c_{\mathrm{mad}}

Numerically, 2cmad20.781.10\sqrt{2} \cdot c_{\mathrm{mad}} \approx \sqrt{2} \cdot 0.78 \approx 1.10.

Asymptotic Spread Drift

For the sample spread Spread(x)=Median1i<jnxixj\operatorname{Spread}(\mathbf{x}) = \underset{1 \leq i < j \leq n}{\operatorname{Median}} \lvert x_i - x_j \rvert applied to samples from Additive(mean,stdDev)\underline{\operatorname{Additive}}(\mathrm{mean}, \mathrm{stdDev}), the asymptotic distribution is approximately Additive\underline{\operatorname{Additive}} (Normal).

The spread estimator computes all pairwise absolute differences and takes their median. For the Additive\underline{\operatorname{Additive}} (Normal) distribution, the population spread equals 2z0.75stdDev\sqrt{2} \cdot z_{0.75} \cdot \mathrm{stdDev} as derived in the Asymptotic Spread Value section.

The asymptotic standard deviation of the sample spread for the Additive\underline{\operatorname{Additive}} (Normal) distribution is:

StdDev[Spread(X1:n)]=csprstdDevn\operatorname{StdDev}[\operatorname{Spread}(X_{1:n})] = c_{\mathrm{spr}} \cdot \frac{\mathrm{stdDev}}{\sqrt{n}}

where cspr0.72c_{\mathrm{spr}} \approx 0.72.

Applying the lemma with a=cspra = c_{\mathrm{spr}} and b=2z0.75b = \sqrt{2} \cdot z_{0.75}:

Spread[Spread(X1:n)]=2z0.75csprstdDevn\operatorname{Spread}[\operatorname{Spread}(X_{1:n})] = \sqrt{2} \cdot z_{0.75} \cdot c_{\mathrm{spr}} \cdot \frac{\mathrm{stdDev}}{\sqrt{n}}

Since Center[Spread(X1:n)]2z0.75stdDev\operatorname{Center}[\operatorname{Spread}(X_{1:n})] \approx \sqrt{2} \cdot z_{0.75} \cdot \mathrm{stdDev} asymptotically:

RelSpread[Spread(X1:n)]=2z0.75csprstdDevn2z0.75stdDev=csprn\operatorname{RelSpread}[\operatorname{Spread}(X_{1:n})] = \frac{\sqrt{2} \cdot z_{0.75} \cdot c_{\mathrm{spr}} \cdot \frac{\mathrm{stdDev}}{\sqrt{n}}}{\sqrt{2} \cdot z_{0.75} \cdot \mathrm{stdDev}} = \frac{c_{\mathrm{spr}}}{\sqrt{n}}

Therefore:

Drift(Spread,X)=limnnRelSpread[Spread(X1:n)]=cspr\operatorname{Drift}(\operatorname{Spread}, X) = lim_{n \to \infty} \sqrt{n} \cdot \operatorname{RelSpread}[\operatorname{Spread}(X_{1:n})] = c_{\mathrm{spr}}

Numerically, cspr0.72c_{\mathrm{spr}} \approx 0.72.

Summary

Summary for average estimators:

EstimatorDrift(E,X)\operatorname{Drift}(E, X)Drift2(E,X)\operatorname{Drift}^2(E, X)1Drift2(E,X)\frac{1}{\operatorname{Drift}^2(E, X)}
Mean\operatorname{Mean}111111
Median\operatorname{Median}1.253\approx 1.253π21.571\frac{\pi}{2} \approx 1.5712π0.637\frac{2}{\pi} \approx 0.637
Center\operatorname{Center}1.023\approx 1.023π31.047\frac{\pi}{3} \approx 1.0473π0.955\frac{3}{\pi} \approx 0.955

The squared drift values indicate the sample size adjustment needed when switching estimators. For instance, switching from Mean\operatorname{Mean} to Median\operatorname{Median} while maintaining the same precision requires increasing the sample size by a factor of π21.571\frac{\pi}{2} \approx 1.571 (about 57% more observations). Similarly, switching from Mean\operatorname{Mean} to Center\operatorname{Center} requires only about 5% more observations.

The inverse squared drift (rightmost column) equals the classical statistical efficiency relative to the Mean\operatorname{Mean}. The Mean\operatorname{Mean} achieves optimal performance (unit efficiency) for the Additive\underline{\operatorname{Additive}} (Normal) distribution, as expected from classical theory. The Center\operatorname{Center} maintains 95.5% efficiency while offering greater robustness to outliers, making it an attractive alternative when some contamination is possible. The Median\operatorname{Median}, while most robust, operates at only 63.7% efficiency under purely Additive\underline{\operatorname{Additive}} (Normal) conditions.

Summary for dispersion estimators:

For the Additive\underline{\operatorname{Additive}} (Normal) distribution, the asymptotic drift values reveal the relative precision of different dispersion estimators:

EstimatorDrift(E,X)\operatorname{Drift}(E, X)Drift2(E,X)\operatorname{Drift}^2(E, X)1Drift2(E,X)\frac{1}{\operatorname{Drift}^2(E, X)}
StdDev\operatorname{StdDev}0.67\approx 0.670.45\approx 0.452.22\approx 2.22
MAD\operatorname{MAD}1.10\approx 1.101.22\approx 1.220.82\approx 0.82
Spread\operatorname{Spread}0.72\approx 0.720.52\approx 0.521.92\approx 1.92

The squared drift values indicate the sample size adjustment needed when switching estimators. For instance, switching from StdDev\operatorname{StdDev} to MAD\operatorname{MAD} while maintaining the same precision requires increasing the sample size by a factor of 1.220.452.711.\frac{22}{0}.45 \approx 2.71 (more than doubling the observations). Similarly, switching from StdDev\operatorname{StdDev} to Spread\operatorname{Spread} requires a factor of 0.520.451.160.\frac{52}{0}.45 \approx 1.16.

The StdDev\operatorname{StdDev} achieves optimal performance for the Additive\underline{\operatorname{Additive}} (Normal) distribution. The MAD\operatorname{MAD} requires about 2.7 times more data to match the StdDev\operatorname{StdDev} precision while offering greater robustness to outliers. The Spread\operatorname{Spread} requires about 1.16 times more data to match the StdDev\operatorname{StdDev} precision under purely Additive\underline{\operatorname{Additive}} (Normal) conditions while maintaining robustness.