AvgSpread

\operatorname{AvgSpread}(\mathbf{x}, \mathbf{y}) = \frac{n \cdot \operatorname{Spread}(\mathbf{x}) + m \cdot \operatorname{Spread}(\mathbf{y})}{n + m}

Weighted average of dispersions (pooled scale).

Also known as — robust pooled standard deviation
Domain — any real numbers
Assumptions — sparity(x), sparity(y)
Unit — same as measurements
Caveat — $\operatorname{AvgSpread}(\mathbf{x}, \mathbf{y}) \neq \operatorname{Spread}(\mathbf{x} union \mathbf{y})$ (pooled scale, not concatenated spread)

Properties

Self-average $\operatorname{AvgSpread}(\mathbf{x}, \mathbf{x}) = \operatorname{Spread}(\mathbf{x})$
Symmetry $\operatorname{AvgSpread}(\mathbf{x}, \mathbf{y}) = \operatorname{AvgSpread}(\mathbf{y}, \mathbf{x})$
Scale equivariance $\operatorname{AvgSpread}(k \cdot \mathbf{x}, k \cdot \mathbf{y}) = \lvert k \rvert \cdot \operatorname{AvgSpread}(\mathbf{x}, \mathbf{y})$
Mixed scaling $\operatorname{AvgSpread}(k_1 \cdot \mathbf{x}, k_2 \cdot \mathbf{x}) = \frac{\lvert k_1 \rvert + \lvert k_2 \rvert}{2} \cdot \operatorname{Spread}(\mathbf{x})$

Example

AvgSpread(x, y) = 5 where Spread(x) = 6, Spread(y) = 4, n = m
AvgSpread(x, y) = AvgSpread(y, x)

$\operatorname{AvgSpread}$ provides a single number representing the typical variability across two groups. It combines the spread of both samples, giving more weight to larger samples since they provide more reliable estimates. This pooled spread serves as a common reference scale, essential for expressing a difference in relative terms. $\operatorname{Disparity}$ uses $\operatorname{AvgSpread}$ internally to normalize the shift into a scale-free effect size.

Algorithm

The $\operatorname{AvgSpread}$ function computes the weighted average of per-sample spreads:

\operatorname{AvgSpread}(\mathbf{x}, \mathbf{y}) = \frac{n \cdot \operatorname{Spread}(\mathbf{x}) + m \cdot \operatorname{Spread}(\mathbf{y})}{n + m}

The algorithm delegates to the Spread algorithm independently for each sample, then forms the weighted linear combination with weights $\frac{n}{n + m}$ and $\frac{m}{n + m}$ .

using Pragmastat.Algorithms;
using Pragmastat.Exceptions;
using Pragmastat.Internal;
using Pragmastat.Metrology;

namespace Pragmastat.Estimators;

internal class AvgSpreadEstimator : ITwoSampleEstimator
{
  public static readonly AvgSpreadEstimator Instance = new();

  public Measurement Estimate(Sample x, Sample y)
  {
    Assertion.MatchedUnit(x, y);
    // Check validity for x (priority 0, subject x)
    Assertion.Validity(x, Subject.X);
    // Check validity for y (priority 0, subject y)
    Assertion.Validity(y, Subject.Y);
    // Check sparity for x (priority 2, subject x)
    Assertion.Sparity(x, Subject.X);
    // Check sparity for y (priority 2, subject y)
    Assertion.Sparity(y, Subject.Y);

    // Calculate spreads (using internal implementation since we already validated)
    var spreadX = FastSpread.Estimate(x.SortedValues, isSorted: true);
    var spreadY = FastSpread.Estimate(y.SortedValues, isSorted: true);
    return ((x.Size * spreadX + y.Size * spreadY) / (x.Size + y.Size)).WithUnitOf(x);
  }
}

Tests

\operatorname{AvgSpread}(\mathbf{x}, \mathbf{y}) = \frac{n \cdot \operatorname{Spread}(\mathbf{x}) + m \cdot \operatorname{Spread}(\mathbf{y})}{n + m}

The $\operatorname{AvgSpread}$ test suite contains 36 test cases (5 demo + 4 natural + 1 negative + 9 additive + 4 uniform + 1 composite + 12 unsorted). Since $\operatorname{AvgSpread}$ is a weighted average of two $\operatorname{Spread}$ estimates, tests validate both the individual spread calculations and the weighting formula.

Demo examples ( $n = m = 5$ ) from manual introduction, validating properties:

demo-1: $\mathbf{x} = (0, 3, 6, 9, 12)$ , $\mathbf{y} = (0, 2, 4, 6, 8)$ , expected output: $5$ (base case)
demo-2: $\mathbf{x} = (0, 3, 6, 9, 12)$ , $\mathbf{y} = (0, 3, 6, 9, 12)$ , expected output: $6$ (equal samples)
demo-3: $\mathbf{x} = (0, 6, 12, 18, 24)$ , $\mathbf{y} = (0, 9, 18, 27, 36)$ , expected output: $15$ (scale equivariance, $3 \times$ demo-1)
demo-4: $\mathbf{x} = (0, 2, 4, 6, 8)$ , $\mathbf{y} = (0, 3, 6, 9, 12)$ , expected output: $5$ (swap symmetry with demo-1)
demo-5: $\mathbf{x} = (0, 6, 12, 18, 24)$ , $\mathbf{y} = (0, 4, 8, 12, 16)$ , expected output: $10$ (scale, $2 \times$ demo-1)

Natural sequences ( $[n, m] in {2, 3} \times {2, 3}$ ) 4 combinations:

natural-2-2, natural-2-3, natural-3-2, natural-3-3
Minimum size $n, m \geq 2$ required for meaningful dispersion

Negative values ( $[n, m] = [2, 2]$ ) sign handling validation:

negative-2-2: $\mathbf{x} = (-2, -1)$ , $\mathbf{y} = (-2, -1)$ , expected output: $1$

Additive distribution ( $[n, m] in {5, 10, 30} \times {5, 10, 30}$ ) 9 combinations with $\underline{\operatorname{Additive}}(10, 1)$ :

additive-5-5, additive-5-10, additive-5-30
additive-10-5, additive-10-10, additive-10-30
additive-30-5, additive-30-10, additive-30-30
Random generation: $\mathbf{x}$ uses seed 0, $\mathbf{y}$ uses seed 1

Uniform distribution ( $[n, m] in {5, 100} \times {5, 100}$ ) 4 combinations with $\underline{\operatorname{Uniform}}(0, 1)$ :

uniform-5-5, uniform-5-100, uniform-100-5, uniform-100-100
Random generation: $\mathbf{x}$ uses seed 0, $\mathbf{y}$ uses seed 1

Composite estimator stress test 1 test:

composite-asymmetric-weights: $\mathbf{x} = (1, 2)$ , $\mathbf{y} = (3, 4, 5, 6, 7, 8, 9, 10)$ ( $n = 2$ , $m = 8$ , highly asymmetric weights $w_x = 0.2$ , $w_y = 0.8$ )

Unsorted tests verify sorting independence (12 tests):

unsorted-x-natural-{n}-{m} for $(n,m) in {(3,3), (4,4)}$ : X unsorted (reversed), Y sorted (2 tests)
unsorted-y-natural-{n}-{m} for $(n,m) in {(3,3), (4,4)}$ : X sorted, Y unsorted (reversed) (2 tests)
unsorted-both-natural-{n}-{m} for $(n,m) in {(3,3), (4,4)}$ : both unsorted (reversed) (2 tests)
unsorted-demo-unsorted-x: demo-1 with X unsorted
unsorted-demo-unsorted-y: demo-1 with Y unsorted
unsorted-demo-both-unsorted: demo-1 with both unsorted
unsorted-identity-unsorted: equal samples, both unsorted
unsorted-negative-unsorted: negative values, both unsorted
unsorted-asymmetric-weights-unsorted: asymmetric weights, both unsorted

As a composite estimator, $\operatorname{AvgSpread}$ tests both individual $\operatorname{Spread}$ computations and the weighted combination. Unsorted variants verify end-to-end correctness including the weighting formula.